Topic Recognition

Unlike other companies, who only recognize named entities (such as cities, countries, people, companies, etc.), Lingospot’s natural language processing technology identifies all topics on a page, including topics such as: “hybrid vehicles”, “retirement planning”, “college applications” and “exotic vacations”. These topics are much harder to automatically extract than named entities and require a deep semantic understanding of the content of the page. Most importantly, these topics tend to also be more interesting to readers than named entities, and better suited for advertisers to promote their products.

Lingospot’s page analysis is performed by our patented adaptive natural language processing algorithm, which is a machine-learning based system that resolves ambiguities common in human languages. Specifically, our engine resolves four main ambiguities:

  1. Part-of-speech, such as recognizing the difference between “ship,” the noun and “ship” the verb.
  2. Phrasal boundaries, such as differentiating between identifying a topic as “bird”, “bird flu” or “bird flu vaccine.”
  3. Word senses, such as distinguishing, based on the context, whether “bat” refers to a baseball bat or the flying mammal.
  4. Parsing, whereby our system parses sentences down to their constituent parts to resolve long-distance dependencies.

This deep analysis, as opposed to the more common shallow keyword-based analysis, improves our understanding of content and allows us to more precisely search and organize it for readers. For example, if a reader were interested in hybrid engines, a shallow keyword analysis would first look for hybrid engines and, if nothing was found, would then suggest articles about search engines just because the word engine appears in them. Instead, our system distinguishes concepts and how closely related they are to each other, so we would present content that’s semantically more similar to hybrid engines, such as renewable energy and fuel cells, even if the word engine is not present on these pages.

Next Steps