Contact:[email protected]
Protecting Critical Systems in Unbounded Networks Robert J. Ellison, David A. Fisher, Richard C. Linger, Howard F. Lipson, Thomas A. Longstaff, Nancy R. Mead Society is growing increasingly dependent on large-scale, highly distributed systems that operate in unbounded network environments. Unbounded networks, such as the Internet, have no central administrative control and no unified security policy. Furthermore, the number and nature of the nodes connected to such networks cannot be fully known. Despite the best efforts of security practitioners, no amount of hardening can assure that a system that is connected to an unbounded network will be invulnerable to attack. The discipline of survivability can help ensure that such systems can deliver essential services and maintain essential properties such as integrity, confidentiality, and performance, despite the presence of intrusions. The New Network Paradigm: Organizational Integration From their modest beginnings some 20 years ago, computer networks have become a critical element of modern society. These networks not only have global reach; they also affect virtually every aspect of human endeavor. Networked systems are principal enabling agents in business, industry, government, and defense. Major economic sectors, including defense, energy, transportation, telecommunications, manufacturing, financial services, health care, and education, all depend on a vast array of networks operating on local, national, and global scales. This pervasive societal dependence on networks magnifies the consequences of intrusions, accidents, and failures, and amplifies the critical importance of ensuring network survivability. A new network paradigm is emerging. Networks are being used to achieve radical new levels of organizational integration. This integration obliterates traditional organizational boundaries and integrates local operations into components of comprehensive, network-based business processes. For example, commercial organizations are integrating operations with business units, suppliers, and customers through large-scale networks that enhance communication and services. These networks combine previously fragmented operations into coherent processes open to many organizational participants. This new paradigm represents a shift from bounded networks with central control to unbounded networks. Unbounded networks are characterized by distributed administrative control without central authority, limited visibility beyond the boundaries of local administration, and a lack of complete information about the entire network. At the same time, organizations' dependence on networks is increasing, and the risks and consequences of intrusions and compromises are amplified. The Internet is an example of an unbounded environment with many client-server network applications. A public Web server and its clients may exist within many different administrative domains on the Internet. Many business-to-business Web-based e-commerce applications depend on conventions within a specific industry segment for interoperability. Within the Internet, there is little distinction between insiders and outsiders. Everyone who chooses to connect to the Internet is an insider, whether or not they are known to a particular subsystem. This characteristic is the result of the desire, and modern necessity, for connectivity. A company cannot survive in a highly competitive industry without easy and rapid access to its customers, suppliers, and partners. More and more, a company's partners on one project are its competitors on the next, so trust definition and maintenance becomes an extremely complex concept. Trust relationships are continually changing, and in traditional terms may be highly ambiguous. Trust is especially difficult to establish in the presence of unknown users from unknown sources outside one's own administrative control. Legitimate users and attackers are peers in the environment and there is no method to isolate legitimate users from the attackers. In other words, there is no way to bound the environment to legitimate users using only a common administrative policy. [back to top] Expanding the Traditional View of Security The natural escalation of offensive threats versus defensive countermeasures has demonstrated time and again that no practical systems can be built that are invulnerable to attack. Despite the industry's best efforts, there can be no assurance that systems will not be breached. Thus, the traditional view of information-systems security must be expanded to encompass the specification and design of survivability behavior that helps systems survive in spite of attacks. Only then can systems be created that are robust in the presence of attack and able to survive attacks that cannot be completely repelled. In short, the nature of contemporary system development dictates that even hardened systems can and will be broken. Survivability solutions should be incorporated into both new and existing systems to help them avoid the potentially devastating effects of compromise and failure as a result of attack. The Definition of Survivability We define survivability as the capability of a system to fulfill its mission, in a timely manner, in the presence of attacks, failures, or accidents. The term system is used in the broadest possible sense, to include networks and large-scale systems of systems. In particular, the focus of survivability is on unbounded networked systems where traditional security precautions are inadequate. The term mission refers to a set of very high-level requirements or goals. Missions are not limited to military settings; any successful organization or project must have a vision of its objectives, whether they are expressed implicitly or as a formal mission statement. Judgments as to whether or not a mission has been fulfilled are typically made in the context of external conditions that may affect the achievement of that mission's goals. For example, assume that a financial system shuts down for 12 hours during a period of widespread power outages caused by a hurricane. If the system preserves the integrity and confidentiality of its data and resumes its essential services after the period of environmental stress is over, the system can reasonably be judged to have fulfilled its mission. However, if the same system shuts down unexpectedly for 12 hours under normal conditions (or under relatively minor environmental stress) and deprives its users of essential financial services, the system can reasonably be judged to have failed its mission, even if data integrity and confidentiality are preserved. It is important to recognize that it is the mission fulfillment that must survive, not any particular subsystem or system component. Central to the notion of survivability is the capability of a system to fulfill its mission, even if significant portions of the system are damaged or destroyed. Survivable system is often used as a shorthand term for a system with the capability to fulfill a specified mission in the face of attacks, failures, or accidents. Again, it is the mission, not a particular portion of a system, that must survive. Characteristics of Survivable Systems As noted, essential services are defined as the functions of a system that must be maintained when the environment is hostile, or when failures or accidents occur that threaten the system. Central to the delivery of essential services is the capability of a system to maintain essential properties (i.e., specified levels of integrity, confidentiality, performance, and other quality attributes). Thus, it is important to define minimum levels of quality attributes that must be associated with essential services. For example, a launch of a missile by a defensive system cannot be effective if the system's performance is slowed to the point that the target is out of range before the system can launch. The capability to deliver essential services (and maintain the associated essential properties) must be sustained even if a significant portion of the system is incapacitated. Furthermore, this capability should not be dependent on the survival of a specific information resource, computation, or communication link. In a military setting, essential services might be those required to maintain an overwhelming technical superiority, and essential properties may include integrity, confidentiality, and a level of performance sufficient to deliver results in less than one decision cycle of the enemy. In the public sector, a survivable financial system is one that maintains the integrity, confidentiality, and availability of essential information and financial services, even if particular nodes or communication links are incapacitated because of an intrusion or accident, and that recovers compromised information and services in a timely manner. The financial system's survivability might be judged by using a composite measure of the disruption of stock trades or bank transactions (i.e., a measure of the disruption of essential services). Key to the concept of survivability, then, is the identification of essential services, and the essential properties that support them, within an operational system. There are typically many services that can be temporarily suspended while a system deals with an attack or other extraordinary environmental condition. Such a suspension can help isolate areas that have been affected by an intrusion and can free system resources to deal with the intrusion's effects. The overall function of a system should adapt to preserve essential services. [back to top] The capability of a survivable system to fulfill its mission in a timely manner is thus linked to its ability to deliver essential services in the presence of an attack, accident, or failure. Ultimately, mission fulfillment must survive, not any portion or component of the system. If an essential service is lost, it could in some cases be replaced by another service that supports mission fulfillment in a different but equivalent way. However, we still believe that the identification and protection of essential services is an important part of a practical approach to building and analyzing survivable systems. As a result, we define essential services to include alternate sets of essential services (perhaps mutually exclusive) that need not be simultaneously available. For example, a set of essential services to support power delivery may include both the distribution of electricity and the operation of a natural gas pipeline. Developing Survivability Solutions Survivability solutions are best understood as risk-management strategies that first depend on an intimate knowledge of the mission being protected. The mission focus expands survivability solutions beyond purely independent ("one size fits all") technical solutions, even if those technical solutions are broad-based and extend beyond traditional computer security to include fault tolerance, reliability, usability, and so forth. Risk-mitigation strategies first and foremost must be created in the context of a mission's requirements (prioritized sets of normal and stress requirements), and must be based on "what-if" analyses of survival scenarios. Only then can we look toward generic software engineering solutions based on computer security, software quality attribute analyses, or other strictly technical approaches to support the risk-mitigation strategies. Hence, survivability depends not only on the selective use of traditional computer-security solutions, but also on the development of effective risk-mitigation strategies that are based on scenario-driven "what-if" analyses and contingency planning. "Survival scenarios" positing a wide range of cyber-attacks, accidents, and failures aid in the analyses and contingency planning. However, to reduce the combinatorics inherent in creating representative sets of survival scenarios, these scenarios focus on adverse effects rather than causes. Effects are also of more immediate situational importance than causes, because an organization will likely have to deal with (and survive) an adverse effect long before a determination is made as to whether the cause was an attack, an accident, or a failure. Awaiting the outcome of a detailed post-mortem to determine the cause, before acting to mitigate the effect, is out of the question when an organization is dealing with the survival of most modern, mission-critical applications. Developments at the CERT� Coordination Center The CERT� Coordination Center (CERT/CC) is developing a survivable network analysis (SNA) method to evaluate the survivability of systems in the context of attack scenarios. Also under development is a survivable systems simulator that will support analysis, testing, and evaluation of survivability solutions in unbounded networks. The SNA method permits assessment of survivability strategies at the architecture level. Steps in the SNA method include system mission and architecture definition identification of essential services and corresponding essential architecture components generation of intrusion scenarios and corresponding compromisable architecture components survivability analysis of architectural softspots that are both essential and compromisable Intrusion scenarios play a key role in the method. SNA results are summarized in a survivability map that links recommended survivability strategies for resistance, recognition, and recovery to the system architecture and requirements. Results of applying the SNA method to a subsystem of a large-scale, distributed healthcare system have been summarized. Future studies will involve the application of the SNA method to proposed and existing distributed systems for government, defense, and commercial organizations. The survivable systems simulator is based on a new methodology called "emergent algorithms." Emergent algorithms produce global effects through cooperative local actions distributed throughout a system. These global effects (which "emerge" from local actions) can support system survivability by allowing a system to fulfill its mission, even though the individual nodes of the system are not survivable. Emergent algorithms can provide solutions to survivability problems that cannot be achieved by conventional means. The survivable systems simulator will allow stakeholders to visualize the effects of specific cyber-attacks, accidents, and failures on a given system or infrastructure. The goal is to enable "what-if" analyses and contingency planning based on simulated walkthroughs of survivability scenarios. For Additional Information This column is based on the following publications, which contain additional information about this topic. R. J. Ellison, D. A. Fisher, R.C. Linger, H. F. Lipson, T. A. Longstaff, N. R. Mead, "Survivability: Protecting Your Critical Systems," IEEE Internet Computing, November/December 1999. H. F. Lipson and D. A. Fisher, "Survivability--A New Technical and Business Perspective on Security," Proceedings of the 1999 New Security Paradigms Workshop, September 21-24, Association for Computing Machinery, 1999. About the Authors Robert J. Ellison is a member of the technical staff in the Networked Systems Survivability Program at the SEI. He is currently involved in the study of survivable systems architectures. He has previously led SEI efforts in software development environments and CASE tools. He has a PhD in mathematics from Purdue University. David A. Fisher is currently leading a research effort in new approaches for survivability and security in information-based infrastructures at the SEI's CERT Coordination Center. From 1973-75, Fisher served as program manager in the Advanced Technology Program (ATP) at the National Institute of Science and Technology (NIST). He earned a PhD in computer science at Carnegie Mellon University, an MSE from Moore School of Electrical Engineering at the University of Pennsylvania, and a BS in mathematics from Carnegie Institute of Technology (now Carnegie Mellon). Richard C. Linger is a senior member of the technical staff in the Networked Systems Survivability Program at the SEI, where he is developing methods for analysis and design of survivability for large-scale infrastructure systems. Before joining the SEI, he was a senior technical staff member in IBM, where he co-developed cleanroom software engineering technology for development of ultra-reliable software systems. He is an adjunct professor at the Carnegie Mellon Heinz School of Public Policy and Management and the Carnegie Mellon School of Computer Science. Howard F. Lipson has been a computer security researcher at the SEI's CERT Coordination Center for more than seven years. He has played a major role in extending security research at the SEI into the new realm of survivability. Earlier, Lipson was a computer scientist at AT&T Bell Labs, where he did exploratory development work on programming environments, executive information systems, and integrated network management tools. He holds a PhD in computer science from Columbia University. Thomas A. Longstaff is currently leading research in network security at the SEI. As a member of the CERT Coordination Center, he has been investigating topics related to information survivability and critical national infrastructure protection. Before coming to the SEI, he was the technical director at the Computer Incident Advisory Capability (CIAC) at Lawrence Livermore National Laboratory in Livermore, California. He completed a PhD at the University of California, Davis in software environments. He received a BA in physics and mathematics from Boston University, and an MS in computer science from the University of California, Davis. Nancy R. Mead is currently leading a research effort in Survivable Network Architectures at the SEI, and is an adjunct professor in the Master of Software Engineering program, Carnegie Mellon University. She is involved in the study of survivable systems requirements and architectures and the development of professional infrastructure for software engineers. Before joining the SEI, Mead was a senior technical staff member at IBM Federal Systems, where she spent most of her career in development and management of large real-time systems. She has a BA and MS in mathematics from New York University and a PhD in mathematics from Polytechnic Institute of New York. Back to the Index