6/15/2023 0 Comments Difebot mkplayer“We’re not trying to define what intelligence is, or anything like that,” says Tung. ![]() Still, even an AI that has its facts straight is not necessarily smart. But better yet would be to combine the technologies, using a language model like GPT-3 to craft a human-like front end for a know-it-all bot. ![]() Tung and Lin agree that this kind of AI cannot be built with language models alone. Ultimately, he wants to build what he calls a “universal factoid question answering system”: an AI that could answer almost anything you asked it, with sources to back up its response. But Tung plans to add a natural-language interface. But Diffbot lets these companies look for sites that are actually selling their shoes, rather just talking about them.įor now, these companies must interact with Diffbot using code. A search engine will return a long list of sites that mention Nike trainers. Fake shoesĪdidas and Nike even use it to search the web for counterfeit shoes. NASDAQ, which provides information about the stock market, uses it for financial research. The popular wedding-planner app Zola uses it to help people make wedding lists, pulling in images and prices. Snapchat uses it to extract highlights from news pages. The search engine DuckDuckGo uses it to generate its own Google-like boxes. But Diffbot also has around 400 paying customers. Researchers can access Diffbot’s knowledge graph for free. Diffbot has to add new hardware to its data center as the knowledge graph grows. It uses more machine-learning algorithms to fuse new facts with old, creating new connections or overwriting out-of-date ones. According to Tung, the AI adds 100 million to 150 million entities each month as new people pop up online, companies are created, and products are launched. “The AI has to play the web like a video game just to experience the pages,” says Tung.ĭiffbot crawls the web nonstop and rebuilds its knowledge graph every four to five days. The AI must scroll down, switch between tabs, and click away pop-ups. It also means it has had to learn to navigate the web like us. Diffbot extracts facts from pages written in any language, which means that it can answer queries about Katy Perry, say, using facts taken from articles in Chinese or Arabic even if they do not contain the term “Katy Perry.”īrowsing the web like a human lets the AI see the same facts that we see. It then identifies key elements on the page, such as headline, author, product description, or price, and uses NLP to extract facts from any text.Įvery three-part factoid gets added to the knowledge graph. Using a super-charged version of the Chrome browser, the AI views the raw pixels of a web page and uses image-recognition algorithms to categorize the page as one of 20 different types, including video, image, article, event, and discussion thread. To collect its facts, Diffbot’s AI reads the web as a human would-but much faster. “A lot of human effort can otherwise go into making a large knowledge base.” Heiko Paulheim at the University of Mannheim in Germany agrees: “Automation is the only way to build large-scale knowledge graphs.” Super surfer “It definitely makes sense to crawl the web,” says Victoria Lin, a research scientist at Salesforce who works on natural-language processing and knowledge representation. ![]() By fully automating the construction process, Diffbot has been able to build what may be the largest knowledge graph ever.Īlongside Google and Microsoft, it is one of only three US companies that crawl the entire public web. Instead of giving you a list of links to pages about Katy Perry, Google gives you a set of facts about her drawn from its knowledge graph.īut Google only does this for its most popular search terms. You can see at a glance that she is married to Orlando Bloom, she’s 35 and worth $125 million, and so on. Search for “Katy Perry” and you will get a box next to the main search results telling you that Katy Perry is an American singer-songwriter with music available on YouTube, Spotify, and Deezer. This also stopped Tim Berners-Lee from realizing what he called the semantic web, which would have included information for machines as well as humans, so that bots could book our flights, do our shopping, or give smarter answers to questions than search engines.Ī few years ago, Google started using knowledge graphs too. But constructing and maintaining knowledge graphs has typically been done by hand, which is hard. They have been around for decades, and were a fundamental concept in early AI research. Each of these factoids gets joined up with billions of others in a sprawling, interconnected network of facts. Pointed at my bio, for example, Diffbot learns that Will Douglas Heaven is a journalist Will Douglas Heaven works at MIT Technology Review MIT Technology Review is a media company and so on.
0 Comments
Leave a Reply. |