The world has seen an enormous surge in the amount of data available, particularly human-generated digital exhaust from our Internet and communications habits. But only a small portion of this data is actually useful for security professionals. The U.S. Intelligence Community is hard pressed to find new tools to assist in navigating this expansive sea of data, cutting though the noise to find valuable insight into national security events, and in turn, policy-making. The Cipher Brief spoke with Kristen Jordan, a Program Manager at the Intelligence Advanced Researched Projects Activity (IARPA), about their Mercury program, which uses data analytics to forecast potential security events.
The Cipher Brief: Could you explain the Mercury program at IARPA? How did it come about and what does it seek to accomplish?
Kristen Jordan: The Mercury program seeks to develop methods for continuous, automated analysis of diverse, existing foreign signals intelligence (SIGINT) data to anticipate and/or detect events such as terrorist activity, civil unrest, and disease outbreaks abroad.
The concept was developed to evaluate the efficacy of SIGINT data in forecasting group- and societal-level events. In addition, the Mercury program is researching the utility of SIGINT data in a completely different way than traditional mission approaches. This was a natural follow on to the Open Source Indicators (OSI) program that forecasted events using several thousand sources of publicly available data. The Intelligence Community utilizes quite a bit of data to maintain an eye on the global landscape, and we are investigating novel approaches to analyzing the data available to the Community.
TCB: How can computer algorithms turn large amounts of data into reliable forecasts of future security events?
KJ: The focus in Mercury is to process and analyze streaming data and develop data extraction techniques that focus on volume, rather than depth, and by identifying shallow features of the data that correlate with events. Most importantly, we expect innovation in the models the researchers will use to generate probabilistic forecasts for future events from this extremely noisy and heterogeneous data. To do any of this successfully takes significant amount of ground truth event data to train the models so that the algorithms are built upon real indicators of real events. That gives us confidence that the models can reliably generate forecasts with a given precision.
TCB: What are some examples of indictors that could potentially lead to future security events?
KJ: An indicator may not be obvious, but several data features aggregated together in time and space may provide indication of a reaction to an event in a population for which a warning may be issued. For example, a change in geolocation of communications over time in a given population where patterns become static, could indicate disease outbreak in combination with increased activity at hospitals.
Ultimately, we are looking for combinations of indirect signals that when taken together can provide the basis for a forecast, with an assigned probability and lead time.
TCB: What does the introduction of predictive analytics mean for intelligence analysts?
KJ: We should consider predictive analytics as another tool for intelligence analysts. Forecasts or warnings of events can be used to cue or focus other modes of information gathering on the ground so the analysts can take a deeper look at the forecasted situation in a given geographical area. Here, significant lead time in a forecast benefits the analyst. These tools will not and should not replace human analysts who provide contextual qualitative evaluation of a forecasted event based on their deep knowledge of the country and its population.
TCB: One of the major concerns over predictive analytics is transparency in how these tools actually come to their conclusions. What kind of technical safeguards or oversight is being considered to make sure these predictive tools do not misguide decision-making?
KJ: IARPA does not want to develop and hand over a “black box” to our community partners. A significant component of the Mercury program is what we call an audit trail. For every forecast that is generated, we can go back and see what features the system recognized as indicators for that warning and understand exactly what type of data or compilation of records it was derived from. Further, the algorithms and code need to be understood and shared with the end user so that the models can maintain accuracy and be adjusted as data streams or communication technologies evolve.
TCB: What kinds of questions do predictive analytics hope to answer? Would they ideally go beyond the what, when, and where to identify who exactly could be a potential terrorist?
KJ: Predictive analytics will help the community stay ahead of the threats, and we hope to identify global hot spots, regardless of event, in the early nascent stages. Automated analytics can alert us to these hot spots, so we can focus other community assets with significant lead time. These tools have the potential to alert us to a group event, even identifying the tipping point of radicalization of a group over time. However, anticipating the activity of a lone wolf actor is a much harder problem, which still requires a large amount of training data for optimizing models.