Wednesday, February 29, 2012

Intelligent Intelligence Gathering by Mining Data on the Internet

Recorded Future can predict at least some aspects of the future by monitoring the Internet. Lots of the Internet.
The two-year old Massachusetts-based firm which is partly funded by Google  Ventures and the CIA’s VC arm (In-Q-Tel, which makes investments to benefit the United States intelligence community)  thinks this aggregated and analyzed Internet information will be especially useful in three areas — government, finance and competitive intelligence.
But will it tell a trader which stocks to buy? Actually, yes. By collecting information and sentiment, it can pick the 50 most popular and the 50 least popular, said  Christopher Ahlberg, the company’s founder and CEO. By going long on the former and short on the latter, traders can make money.
“Organizing the news and companies by decile, you can prove the top decile significantly outperforms the bottom. This can turn out to be a very nice trading strategy, either as a standalone factor or incorporating the data in an overlay to an existing model.” From May to August the Recorded Future strategy had a 10 percent gain against a 10 percent loss for the S&P 500.
Ahlberg founded Spotfire, a company which turned complex data into analytic dashboards for fast action and collaboration. He sold it to Tibco. Recorded Future is a step further into the depths of information. It takes 300,000 Web pages per hour from 40,000 to 50,000 Internet sources and digests the information to create a database that can answer questions like “Which heads of state visited Libya in 2010?” or “What pharma companies are releasing new products in the first quarter of 2012?”
The company repeatedly says it is not a search engine, although I think it may supplement or replace search engines for time-pressed users who want specific information without spending an hour or two wading through the results from Google or Bing. At $149 a month for individual users, it could pay for itself pretty quickly in consulting and research.
As Steffan Truve, co-founder and chief scientist at the company said in avirtual users group conference recently, the company faces questions about what it offers in social media, media and government reports that a user can’t get from Google.
Try asking Google what is happening in Stockholm next week, he suggested. Google does not have a semantic understanding of information. Recorded Future works with structure such as people, places, products and companies; events such as meetings, travels, acquisitions, earning calls and natural disasters; and ontologies or hierarchies that explain groupings such as world leaders, corporations or technology areas. And the company’s analysis treats time in several ways — when an event was reported and when something occurred or is expected to occur. It also measures momentum or media buzz.
An example for competitive analysis: “What companies are working on fuel cell products expected between 2012 and 2015?
The query brings up a moving chart of companies, places and products over a timeline extending to the end of December 2014. Sliders for time and size of the graph give you control over the view, and the user can explore Toyota, see mentions of its fuel cell plans out to 2015, go back and see mentions of the Prius, each with the source indicated. A network view shows which companies are connected, so it displays the links between Ballard Power Systems andHonda Motor, Shanghai Automotive and Roewe 550 and information on the Chosun lblo fuel cell technology.   A source view produces blocks showing major sources of the news — Fast Company, the New York Times, Engadget, Techcrunch, auto blogs and PR Newswire, and where the sources are located — mostly in the U.S. but also the UK, Spain, Romania, Canada and China. A third view shows momentum by recording which firms are getting the most mentions going out to midway through 2015.
“A new report finds that Daimler and Honda are the automakers best positioned for the commercialization of fuel cell vehicles (FCVs) beginning in the 2014/2015 timeframe.” That report is from Pike Research.
It’s easy to see how useful this would be to government analysts. In a recent New York Times storyQuentin Hardy sat with Ahlberg for a demo. Among the topics they looked at — relationships among individuals in the Chinese Communist Party who seem to have unusually strong ties and an Iranian barter company that developed relations in Belarus as sanctions increased
Hardy wrote that Recorded Future  plans to move further into Chinese and Arabic news sources.
And this is just using public information. If the company is backed by the intelligence community, one would assume — or from a taxpayer’s point of view, hope — that the NSA is using this on the tons of material it gathers from private email, land lines and mobile phones. Expansion of Recorded Future into Chinese and Arabic would suggest an intelligence interest. Ahlberg told the New York Times that one client wants to see Chinese coverage of events that is longer in the Chinese version than in the English publication. That reminds me of Thomas Friedman who has often said that the West should pay attention to what Arab leader say in Arabic, which is often quite different from what they say in English. (One can only speculate about what Chinese and Arab intelligence agencies are reporting to their leaders about the contenders for the Republican presidential nomination. Just because it is public doesn’t make it, or them, intelligible.) 
To think how an intelligence analyst would approach this sort of work without a broad quantitative gathering of information suggests what a big step this is. Remember Kremlinologists who tried to divine the hierarchy by who was standing next to whom in reviewing parades? Or think of a China watcher trying to peer into the opaque organization of the Communist Party from a perch in the American embassy in Beijing or CIA offices in Langley. What snippets of information would she have and how would she hope to stitch them together, barring an inside information source who could be trusted, and the record indicates the U.S. has developed very few of those in the history of the USSR and China.
Longer term, Mr. Ahlberg says, “Why not process the world’s 100 million blogs as well?” These could be graded and ranked for reliability, so interested people would know which experts to read.
The software can find  signs of Nokia relationships with Microsoft in the months before it entered a legal agreement to use the Microsoft Windows mobile operating system. (Although, to be fair, some people predicted this when Nokia hired CEO Stephen Elop, who had been a Microsoft executive.) Ahlberg said that the company can look at business relationships like Microsoft and Nokia and surround the key players with products in its reporting.
“The primary applications we are in are intelligence, government, defense and trading but we have early users in competitive intelligence.”
Truve said Recorded Future relies on large numbers which helps achieve a good prediction even from quite noisy data, and it also assigns a confidence level, which can change over time, to the sources it uses.
Steve Holden, Recorded Future’s community manager, said the in an areas like competitive analysis a user could learn about Zynga’s business relationships over the last two years in a few minutes. Or Recorded Future can map the evolving relationships of Chinese oil companies to see when investments have taken place.
Steve Schohn, federal project engineer at Recorded Future, said it can provide government users with some situational awareness, such as what is going to happen in Baghdad next week. Many of the analysts use the company’s API to plug the data into their own applications — Google Earth maps are one favorite — to get a view of what is happening or likely to happen. That way reports of protests or riots can be turned into visualization over time. (See Occupy Wall Street timeline for an example on the company web site.)
The Recorded Future has notes on sources – an interesting one was TV of Iran which apparently found the protests in the U.S. of great interest for its Arab audience.
Ahlberg said about 10 companies are doing interesting work in this area. He keeps track of them with Recorded Future.

No comments: