INTRODUCTION
Automation, innovation, reaction and expansion (AIRE) are the foundation of the next generation of analysis techniques and tools – Network Analysis 2.0. The importance of data network analysis is often overlooked, but it impacts many areas including cyber defense, cyber intelligence, law enforcement / investigative analysis, and financial and critical infrastructure. Cyber attacks are conducted daily by organized groups around the world, and network analysis is important for maintaining total cyber situational awareness. AIRE enables analysts to make huge strides in data analysis, maintain a competitive advantage and stay one step ahead of attackers.
THE EXAFLOOD
Internet traffic is poised to double every year for the foreseeable future. [1] With the Internet moving an exabyte of data every hour, doubling that means either new network analysis methods have to be employed or we’re going to need a lot more analysts to manage it all.
To put an exabyte into perspective, one exabyte is the equivalent of 50,000 years of DVD quality video. [2] The print collection of the U.S. Library of Congress is 10 terabytes. All hard disk capacity developed in 1995 was 20 petabytes. 1000 Gigabytes = 1 Terabyte. 1000 Terabytes = 1 Petabyte. 1000 Petabytes = 1 Exabyte.
Most discussions of Internet traffic are focused on whether the infrastructure can handle the increased volume and if and when it might collapse. Network analysis will have to evolve to keep up with this traffic explosion. This huge increase in Internet traffic has been described as an Exaflood. [2] Doubling Internet traffic every year means that, in a short time frame, what we see today will be miniscule tomorrow. Without the means to keep up with the data volume, useful information will continue to be lost.
THE NETWORK ANALYSIS REVOLUTION
Rarely is there discussion on how analysts are going to investigate these huge volumes of data for defensive and offensive operations. If the infrastructure is barely able to handle the data, how can anyone be expected to manually process it in a timely manner? A revolution in data analysis is coming and it will be influenced by several factors, Automation, Innovation, Reaction and Expansion (AIRE).
The motivators that are moving analysis to the next level are:
• Automation: In order to keep up with the volume and velocity of network data, analysis processes will have to be automated. Human involvement will move to high-level tasks, process refinement, developing new techniques and reporting. Machine to machine communication will be required for effective analysis.
• Innovation: Cutting edge analysis isn’t achieved with old methods and limited tools. Innovative ideas will be required to compete with emerging threats. Reaching the next level will require multiple, small analysis teams working in parallel with rapid cycles of development. The path to innovation is paved with failure and these overlapping teams will increase the chances of success. To gain an advantage over attackers, defensive analysis must match offensive development.
• Reaction: Even today, real-time analysis is difficult. By definition, the processing involved means instant analysis is actually post-event analysis. As the amount of data being processed grows, the time gap between gathering the information and the analysis results widens. New methods and systems will close this gap, resulting in the ability to react faster while processing more events.
• Expansion: Attackers are adept at expanding into new areas – today it’s hosts, tomorrow it’s network infrastructure. Analysis must be equally agile, and to reach the next level it has to expand into areas that were previously thought to be unrelated. Blazing trails will be the standard, and together with innovative thinking new forms of analysis will emerge.
These factors signal a rise to the next level and define the challenges in making the leap.
AUTOMATION – THE DATA CONUNDRUM
The avalanche of data available to analysts is both a blessing and a curse. With more data, more events can be correlated which helps uncover covert activity. The ability for a human to manually process all this data is near impossible.
As an analyst, I can maintain a decent number of relationships in my head, but as the relationships grow and become more discrete, I’m forced to document the linkages. Documentation is fine for my own work, but it’s becoming more and more necessary to work in teams that are often geographically diverse. A static spreadsheet with all my information becomes a hindrance to large-scale collaboration.
The next level of analysis will hinge on machine-to-machine communication for alerts, event correlation, and data source integration. Most data collection tools are stove pipes with limited ability to import data from other sources and also export data in usable formats. For successful automation, analysis systems will need to be able to consume and distribute various data formats. Designing for the unknown will be a key factor in cutting edge data analysis tools.
Great strides are being made in traffic collection, being able to perform packet capture 10-gigabit links creates a huge data storage problem. Not only is storage an issue, but humans can’t realistically process that amount of data, and speed and accuracy are only possible through automation. Analysis 2.0 will rely heavily on automation to make analysts more efficient and effective. The end goal is to extract as much information as possible from available data while simultaneously keeping networks secure. Automated metadata exploitation is the only way to efficiently utilize the vast amounts of digital data. Sifting through large datasets is tough. Updating those datasets is even more difficult. Humans alone trying to correlate those datasets is impossible. Process automation is the engine powering data analysis to the next level.
INNOVATION – THE DRIVING FORCE
The need for new ideas in the field of network analysis is driving the push towards the next level. To generate new ideas, analysis shops will have to take a new approach. Although multiple independent teams working on the same problems will result in more failures, it will also result in more success. A system that allows these teams to share their missteps will actually result in more innovation in shorter time periods. The ability of these small teams to share their failures and benefit from each other’s successes will be obvious in how quickly they can solve problems.
Classic analysis tools will be forced to adapt to these smaller teams working in rapid cycles in order to survive. Change is coming not only for analysis teams but for the tools they rely on for storing, sharing, researching and reporting. These tools will be combinations of many tools and built in a way that allows new ideas to be quickly mocked up, tested and ultimately moved into the analysis process.
Imagine that we are currently blind to a new form of analysis that can turn the tide against network attacks. This new analysis is hidden because it involves data relationships that have yet to be made. Being able to make these concealed connections obviously will have a tremendous impact on analysis production. Systems that enable the easy addition of new data sources will encourage the discovery of new forms of analysis.
The cornerstone of successful analysis is being able to rapidly analyze a problem. The ability to discover new techniques and incorporate those techniques back into the system is going to be essential for success. A system that does this creates an environment that is continuously adapting to the problems it encounters. Bad guys are in constant development cycles to stay ahead of the people trying to stop them. In order to keep up with malicious activity, the systems these teams use must be able to handle rapid analysis. This ability to quickly improve network analysis in short cycles allows an entity to dominate their adversaries. The benefits of a system that can quickly incorporate new techniques, is that everyone using the system is instantly able to use the innovations. This ability to share technological leaps allows teams to work overlapping problems and creates a competitive environment that is constantly building new ideas. Analysis 2.0 will be lead by people, but the foundation will be built with extensible systems that promote innovation.
REACTION – REAL-TIME ANALYSIS
Real-time is relative. An event has to be processed to be able to send a related alert. Real-time can be interpreted as the time it takes to process an event and send the alert, which can vary based on what kind of analysis is being done, and who or what is performing it. A human doing real-time analysis is going to be significantly slower than a system that is designed to correlate events and send notifications, but both have been called “real-time analysis.†The next step for analysis is machine-to-machine communication replacing human analysts when comparing multiple data sets. Faster reaction to events means that counter-measures can be deployed faster and malicious activity is detected sooner. Being able to react faster also means that the amount of data that can be analyzed is significantly larger. Faster response time will result in the detection of more malicious or abnormal activity. Next generation tools will help define Analysis 2.0 by being able to both consume and produce data sets that improve reaction time, which will result in giant leap forward for analysis.
EXPANSION – DEFINING DATA RELATIONSHIPS
Like innovation, expansion involves new ideas. The difference is that expansion involves combining new unrelated data and forging new ground in areas that may not be directly related to network analysis. Attackers are using expansion to move from end-point takeover for botnets, to controlling network infrastructure. The end goal is still the same, but by controlling network devices, they are able to manipulate more hosts and make their operations more covert. This movement to network infrastructure signals their application of the AIRE motivators. Compromising network equipment is certainly more difficult, but the possibilities for abuse are more rewarding. The only way to combat these methods is by adapting similar tactics and exploiting all available metadata.
Another example of expansion is the fusion of data network analysis with social networks. Two or more people communicating form a social network. When the communication method crosses an IP network, the link between the people and the devices they are using can be made. Network analysis can help determine all the devices that the communication passes through between these people. Social networks and data network analysis have an obvious relationship – it’s the hidden relationships that define expansion.
CONCLUSION
The network analysis landscape is barren. Threats are becoming more complex and covert. The number and variety of attacks are increasing, and that makes it difficult to sort through the noise. Old analysis tools just don’t cut it – they have limited focus and make exporting data impossible without heavy application modification. These old tools will disappear as the new generation of Web-based analysis systems prove their worth. Key features of cutting edge tools will include the ability to scale easily and quickly, collaboration tools or hooks, input and output APIs, flexible authentication, accounting and authorization, browser based interfaces with the ability to develop new clients, retrospective and predictive analysis techniques and dynamic reporting and customization. Analysis 2.0 will allow analysts to focus on producing results and staying ahead of emerging threats.
Sources:
[1] Cisco’s Global IP Traffic Forecast and Methodology, 2006-2011
http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/net_implementation_white_paper0900aecd806a81aa.pdf
[2] http://en.wikipedia.org/wiki/Exaflood#Exaflood