August 2, 2018

NEW: Multi-language AI Event Detection

Sean Solbak by Sean Solbak

Hello, Hola, Bonjour, Hallo, Здравствуйте, Olá!

Today we’re proud to announce that SAM’s AI is now multilingual! We’re excited to release a major update to our core AI event detection systems (our NLP and NER stack) to process and understand tweet concepts across multiple languages. Our Real-Time Knowledge Engine now understands Twitter in Spanish, Portuguese, French, German and Russian!

Early results confirm that SAM is detecting crisis events and breaking news before news outlets. We’re excited to roll this functionality out to our regional clients, enabling them to better monitor their local areas in their native languages. Likewise, this means our multinational and global clients will be alerted to events even faster than before without needing to translate chatter and media reports.

Solving event detection in numerous languages presented some unique technical challenges. We knew from our work in English-based NLP that relying on off-the-shelf systems wouldn’t work for our purposes, as these systems fail to understand tweets with few words or ones that are riddled with slang and colloquialisms (which are often the most valuable in breaking news). The language of a tweet has many implications when it comes to NER, specifically when it comes to location extraction. A French tweet that refers to Abbeville is much more likely to be talking about Abbeville, France than Abbeville, South Carolina, Alabama, or Georgia. Likewise translating the way an English speaker might describe a breaking news situation does not necessarily translate one-to-one into other languages. For example, we knew that “the roof is on fire” in English might be a metaphor, but with the current solutions available it’s near impossible to determine the regional slang and word-for-word translations required to provide the important context our system thrives on. How does a Spanish speaker describe a building fire? How does a French speaker describe an active shooter situation? Using Machine Learning our AI breaks down each tweet to understand the exact context and subject matter of each one.

The major advantage of this approach is that once we understand these regional contexts and language-based differences, we can better detect non-english events faster and with higher accuracy. Like in the example above, SAM can now go beyond any one language to know when tweets are referencing the same event.

Improving our language coverage is something we will continue to work on closely with our clients, as our models strengthen every day and each event they encounter brings new learnings and depth to our systems. We are extremely proud of this milestone as it allows us to scale our proficiency in additional languages rapidly – there are many more languages coming in the pipeline!

If you are a SAM user (and you’re logged into the system), I’m pleased to highlight some live examples of how our knowledge engine can inform you of global events. Here goes: fire in Paris, plane crash in Mexico, and an accidental explosion at a school in Brazil.

Last bit of news from me, we’re excited to announce our language partnership program. We’re teaming up with local newsrooms around the world tackle their local languages and providing the entire SAM platform at a discounted rate! If you would like to learn more about our language partnership program or just about SAM, in general, we would love to hear from you.

By continuing to use this site you consent to the use of cookies in accordance with our cookie policy. Learn more