More Situational Awareness for Industrial Control Systems (MOSAICS): Engineering and Development of a Critical Infrastructure Cyber Defense Capability for Highly Context-Sensitive Dynamic Classes: Part 2 – Development

More Situational Awareness for Industrial Control Systems (MOSAICS): Engineering and Development of a Critical Infrastructure Cyber Defense Capability for Highly Context-Sensitive Dynamic Classes: Part 1 – Engineering

Posted: June 15, 2020 | By: Aleksandra Scalco, Steven Simske, Ph.D.

There is a Department of Defense (DOD) operational need for cyber defense capabilities to defend critical infrastructure from cyber attack. Critical infrastructure systems, such as power, water and wastewater, and safety controls, affect the physical environment.

“MOSAICS,” or More Situational Awareness for Industrial Control Systems, is a Department of Defense (DOD) response to an operational need to defend mission-critical infrastructure. The MOSAICS capability concept was to automate selected procedures to detect, mitigate and recover from a cyberattack. It is combined with the best of breed technologies related to analytics, visualization, decision support, and information sharing [1].

Systems engineering principles were applied during the concept development phase to convert operational needs into an engineering-oriented view in several modes leading into the engineering phase. Implementing Model-based Systems Engineering (MBSE) is valuable during this phase. The Navy is moving from document-based systems engineering to a standard, enterprise-wide architecture model. The objective is to support the Fleet with warfighting capabilities more effectively.  A significant benefit is managing requirements and system baselines that remain for many years before replacement systems are deployed. Using an MBSE approach enables the engineers to integrate upgrades and better integrate the system into systems of systems. “A new system that is to be developed to replace a current obsolescent system will inevitably have performance requirements well beyond those of its predecessor” (Kossiakoff, 2011) [2].

Among the challenges of transitioning to the MBSE is an ingrained culture that resists change, and the cost of MBSE software. A more practical challenge is that MBSE must be injected at the start of a program. MBSE can, however, help to reduce risk through requirements validation. Modeling the requirements statements into the model itself allows stakeholders to validate the subject system’s functional requirements in an understandable language. Risk Analysis also can be integrated into the MBSE process. Disadvantages of MBSE are an initial investment, the need for employee training, and increasing complexity. In contrast, the advantages include cost reduction, cost-effectiveness, and risk reduction during production. This last area is where the most significant impact of failure can occur in a program’s lifecycle.

External system interface requirements are particularly important in the development because of the large integrated extension of smart sensors, instruments, and other devices networked together with computer applications. Most of the OT systems were engineered before today’s interconnected, highly computer-networked environments. They faced static causal relationships of accidents and human factors. HICVs change that dynamic. Regardless of the amount and level of training, cybersecurity training does not defend against cyber exploitation attack vectors such as phishing and spear phishing. The most poignant observations made during data collection efforts of a DOD Joint Test known as Joint Base Architecture for Secure Industrial Control Systems (J-BASICS), were from users operating at the traditional IT enterprise levels of ICS who did not behave any differently than those ICS operators who had not been exposed to the same cybersecurity training when confronted with phishing attacks [3]. J-BASICS showed that even cyber security trained experts may not practice good cyber hygiene even knowing the potential negative consequences. This is why automated course of action is needed. The human is best suited for final decision making rather than near-real time response actions.

Accidents in OT are typically attributed to the complexity of systems and scenarios. Automation addresses this complexity, which is introduced by smart sensors, instruments, and other devices networked together. Without automation, the possibility for correction in more complex environments is likely to remain unchecked. If this premise is correct, then the design of systems warrants the engineering of more significant computer-aided correction of Course Of Action (COA) or automated systems. Interestingly, the most significant resistance to the integration of automated systems is the perception that automated course of action creates a greater significant potential for accidents or failures of the desired effect. Whereas, if the J-BASICS findings are correct, then human-in-the-loop may not be the ideal system design. The ideal system design would be human out-of-the-loop with the fallback redundancy allowing for a human to manually intervene when they observe an error.

Functional analysis emphasizes a modular configuration, software design in a modular architecture, and effective human interactions of user interfaces. “Among the most critical elements in complex systems are those concerned with the control of the system by the user — analogous to the steering wheel, accelerator, shift lever, and brakes in an automobile” (Kossiakoff, 2011) [4].

Systems engineering principles support the integration of cyber defense capabilities into these context-sensitive critical infrastructure dynamic classes. OT systems are vulnerable to cyber attacks such as ransomware directed at OT hardware and software that monitors and controls physical devices, processes, and events in critical infrastructure. Cyber attacks bring an element of physical risk that OT operators traditionally did not consider for OT systems. These are powerfully protected, in many cases, by anomaly detection algorithms and process violation reporting. “While both IT and OT [operators] may be equally susceptible to phishing attacks, more nuanced evaluation of the user’s respective context domains would reveal that IT and OT operators are exposed to different context variables. These could create very different outcomes relative to a user’s response to phishing attacks” [5]. “[T]he objective of [risk management] is to minimize the total cost of managing each significant risk area” (Kossiakoff, 2011) [6]. The functional design must provide test points for fault isolation, maintenance, environmental provisions, and opportunity for future growth [7]. Prototyping of actual hardware and software are integrated into the system for laboratory functional technical validation and verification. A second field demonstration includes operators to validate and verify the MOSAICS system design.

Transition of MOSAICS to commercial industry will help to ensure continued viability for the various classes of OT systems and components across sectors. Component design becomes a commoditized industry. Modern electronic component production dramatically reduced production costs by standardizing components. Customization of components increases the cost. This standardization contributes to transforming the design, development, production, and delivery of electronic components. It also impacts cost, reliability, and Design for Manufacture (Dfx). Typical activities of design validation include “conducting test and evaluation of engineered components concerning function, interfaces, reliability, and producibility, correcting deficiencies and documenting product design” (Kossiakoff, 2011) [8].

Configuration Management (CM) contributes to the integrity of the system design. It maintains vital system development baselines, which include the functional baseline, the allocated baseline, and the product baseline, all essential elements throughout the system lifecycle. “Formal change control of system-level changes is usually exercised by a designated group composed of senior engineers with recognized technical and management expertise capable of making judgments among performance, cost, and schedule,” (Kossiakoff, 2011) [9]. The goal of integration is to engineer the new system into a compelling operating whole.

During test planning and preparation, the MOSAICS prototype becomes real, and interface issues are resolved. Deviations from expected test results can be due to deficiencies in the equipment, procedures, execution, analysis, the system under test, or excessive stringent requirements. Dealing with a test failure must be traced for understanding so that corrective action can be made. Steps taken prior, during, and after a test, contribute to the diagnosis of a test failure. Before Trident Warrior 2020, a final prototype baseline will be locked down, and no further late injection of technologies introduced. “A typical test configuration consists of the system element (component or subsystem) under test, a physical or computer model of the component or subsystem, an input generator that provides test stimuli, and output analyzer that measures element test responses, and control and performance analysis units,” (Kossiakoff, 2011) [10]. The system test configuration subjects the system to operational and environmental conditions in which it will perform. Some critical systems, however, require continuous operations and cannot be stopped or paused for test [11].

A model is a useful tool in systems engineering. It helps developers think about and understand complications that are difficult to observe independent of context. Human factors from behavioral science can add to the complexity of a system observed. A complex system has multiple stable, transient, continuous evolution, or no lasting states. “A complex system may have multiple stable states (meaning each state is metastable), transient states, or even no lasting stable states, exhibiting continuous evolution. Perturbations in the system may result in recovery to the former state but may also lead to transitions to another state and consequent radical changes of properties. Besides, details seen at the fine scales can influence large-scale behavior” (INCOSE, 2015) [12]. Advanced Persistent Threats (APT’s) leverage of phishing against OT to attack critical infrastructure assets demonstrates this point. “Today phishing, a human-focused exploit, constitutes 91% of successful attack vectors against Federal assets. This means HICV’s are the weakest cyber link. The success of these attacks also suggests HICV’s are not well understood nor mitigated” (Merz, 2019) [13].

Test planning can ensure that MOSAICS is substantially better positioned for testing. Preparing the test environment and constraints, and using small scale tests to collect information all contribute to test planning. Verification is the evaluation of a system or component to determine if it is built correctly to satisfy the conditions imposed at the start. Validation is the evaluation of a system or component to determine if the right product was built to meet user operational requirements. Verification is performed during Developmental Test (DT). DTs are one-on-one tests performed in controlled environments testing to specifications for precise performance objectives. The operational test is the evaluation of a real production item by an independent agency in as realistic an environment as practical with normalized operators performing activities for validation. Personnel training and knowledge transfer to the user responsible for operations is vital for adequate preparation of the transition to a new system. Human error is often less a factor in the failure of a system than an error triggered by poor design, or violation of use and maintenance [14]. “Among the most critical elements in complex systems are those concerned with the control of the system by the user” (Kossiakoff, 2011) [15]. Human factors have to be taken into consideration as a potential reliability issue whereby components may present operating hazards if not used as designed and intended. Scenario brainstorming is so important to determine the best way to deter malicious effort to gain privileged access from privileged access holders. Psychology is a key in these tests of deterrence.

Development Stage

Of the production operations, the establishment of an active Information System (IS) is one critical to support successful production operations. In production operation systems, the engineering organization coordinates with users, developed component engineering, production, assembly, integration and acceptance test, and subcontractor engineering [16]. The manufacture of a new complex system without an effective IS can hinder production operations. IS supports organizations integrating hardware, software, data, people, and processes. Several factors contribute to the complexity of a system production phase, including: 1) advancing technology; 2) requirement to ensure compatibility of new processes with workforce organization and training; 3) design of communications among distributed production facilities; 4) acceptance test equipment; 5) manufacturing information management; and 6) provisions for change. Acquiring services under contracts to support operations is comparable to the complexity of the design of the actual system itself. Similar to the concept development phase, planning, design, and implementation occurs in production.

Concurrent engineering involves engineering analysis, design, simulation, and testing to examine components for producibility and transition. Installing, maintaining, and upgrading the MOSAICS system requires systems engineering principles and expertise throughout the operational lifecycle. Integrated Product Teams (IPTs) assemble expertise from various organizational units and external interfaces. Members of the IPT perform specialist activities such as mission assurance, or science and technology research. Concurrent engineering may run risks, as well. “The problem of making concurrent engineering effective is that design specialists, as the name implies, have a deep understanding of their disciplines but typically have only a limited knowledge of other disciplines, and hence a lack of common vocabulary (and frequently interest) for communicating with specialists in other disciplines,” (Kossiakoff, 2011) [17]. Concurrent engineering brings together the appropriate functional disciplines throughout the systems engineering “Vee” [18]. Systems engineers lead the process of orchestrating specialty engineers. Systems engineering is to “serve as coordinators, interpreters, and, where necessary, as mentors” (Kossiakoff, 2011) [19]. Experienced operators and users bring system knowledge. Critical systems engineering principles are: 1) concurrent engineering takes place throughout system development; 2) the transition process of a new system from development to production can be particularly tricky; and 3) commercial development and production may be a dedicated separate phase in the system life cycle. This includes a preproduction prototype and selection of manufacturing procedures and equipment [20].

Conclusion

MOSAICS is the first prototype to address the operational need for cyber defense capabilities to defend mission-critical infrastructure from cyber attacks. Eventually, this prototype will be shared with commercial industry through DOD Industry Days for further research and development. This approach can lead to an innovative, game-changing capability. These planned Industry Days are good opportunities for industry to better understand the MOSAICS JCTD Transition Management (XM) plans and needs. It is also allows for industry to ask questions and provide feedback to the MOSAICS JCTD Integrated Management Team (IMT), and provide a valuable feedback mechanism to the JCTD Technical Management (TM) team early in the engineering and development life cycle.

Few professionals possess the skills to traverse both IT and OT systems. Finding the right personnel and quantifying cybersecurity risk is also a challenge. A more significant challenge is calculating the reliability of components for cyber resiliency, and developing methods to test for resiliency when thresholds are almost impossible to define in today’s lexicon. Estimating how much more testing, red team/blue team, and scenarios are needed to estimate the size of the remaining problem set is key to risk mitigation strategies. It is important to understand red team offensive techniques to engineer effective defensive threat-based countermeasures prioritized by potential impact severity. Systems engineering principles can provide a mechanism to integrate contextual information from cyber-physical systems into context-sensitive critical infrastructure dynamic classes. This will improve cyber resilience in OT, and successfully transition MOSAICS to operations[21].

References

  1. Aleksandra Scalco, M. J., Steve Simske (2019). “More Situational Awareness for Industrial Control Systems (MOSAICS) Joint Capability Technology Demonstration (JCTD): A Concept Development for the Defense of Mission Critical Infrastructure. Homeland Defense & Security Information Analysis Center.
  2. Ibid.
  3. Merz, T. (2019).
  4. Kossiakoff, 2011.
  5. Ibid.
  6. Ibid.
  7. Ibid.
  8. Ibid.
  9. Ibid.
  10. Ibid.
  11. Ibid.
  12. INCOSE, 17.
  13. Merz, 2019.
  14. Miller, E. (2019). Human Factors in the Design of Complex Systems. E. 501. November 6, 2019, Colorado State University (CSU).
  15. Kossiakoff, 2011.
  16. Ibid.
  17. Ibid.
  18. INCOSE. 2012. Systems Engineering Handbook: A Guide for System Life Cycle Processes and Activities, version 3.2.2. San Diego, CA, USA: International Council on Systems Engineering (INCOSE), INCOSE-TP-2003-002-03.2.2.: A Guide for System Life Cycle Processes and Activities, version 3.2.2. San Diego, CA, USA: International Council on Systems Engineering (INCOSE), INCOSE-TP-2003-002-03.2.2.
  19. Kossiakoff, 2011.
  20. Ibid.
  21. Scalco, 2019.