Detailed Class Agenda

The following is a detailed list of topics covered by the course.

Modern System Safety Fundamentals

Basic Concepts from System Engineering
The Hidden Cost of Defects Over Time
Intro to System Safety
History of Safety Engineering
Safety: Old View vs. New View (aka Component View vs. Systems View)
Traditional Safety Engineering
- Losses Caused by Failures
- Methods to Analyze Failures
- Limitations of Failure-based Analysis
- Redundancy & Independence
- Unintended Consequences of Redundancy
- Component View of Safety
- System Safety Order of Precedence
- Learning from Real Accidents: Patterns that Defeat Traditional Safety Efforts
The Next Generation of System Safety
- Losses Caused by Interactions, Requirements Errors, and Other Non-failure Causes
- Anticipating Interactions and Errors
- Modeling Control Loops
- Failure Accidents vs. Interaction Accidents
- Learning from Real Accidents: Recognizing New Patterns
Probability in Safety
- Common uses of probability
- Common probability restrictions
- Learning from Real Accidents: Probability insights and oversights
- Probability-based view vs. Control-based view
Automation and Software: Safety Principles and Lessons Learned
- The Primary Cause of Unsafe Software Behavior
- Failure and Error Concepts Applied to Software
- Rules and Limitations Applying Probability to Software
- Standard Risk Matrix vs. Software Risk Matrix
- Software Reuse: Case Study
- Software Redundancy: Case Study
- Software Diversity: Case Study
- Software Independence and N-Version Programming (NVP)
- The "Single Hardest Part" of Building a Software System
- Learning from Real Accidents: Independent Systems Controlled by Software
Safety and Reliability
- Common Confusions and Critical Differences
- Tradeoffs
Summary

Human/Automation Interactions

Terminology: Control Actions, Unsafe Control Actions, Decision Making, Process Model, Feedback, Controlled Process
Learning from Real Accidents: Poor Automation Defeating Human Operators
Statistics: Automation's Counter-intuitive Impact on Safety
Counter-intuitive Impact of Tasks on Human Performance
Learning from Real Accidents: Does Automation Simplify or Complicate Human Tasks?
The Automation Dilemma: Human-Centered vs. Engineering-Centered Design
Learning from Real Accidents: Poor Engineering Decisions That Appeared Reasonable
Modeling Both Humans and Automation Together
Safety Conclusions: Component View vs. Systems View
Hindsight Bias
Restrictions on Using "Failure" in a Human Context
Human Factors: Old View vs. Systems View
Case Study: Human Factors Engineering
Human Factors: Natural Mapping
Examples: Designs that Cause (or Prevent) Human Error
Learning from Accidents: Catching Poor Engineering Assumptions about Human Operators
Human Factors: Management by Exception
Limitations of Management by Exception
Human Factors Experts: Recommendations to Designers
Summary

Systems Thinking, Accident Models, and STAMP

Three modes of thought: strengths and weaknesses
- Analytic Reduction (aka Decomposition, or Divide & Conquer)
- Statistics and Probability
- Systems Theory (for Complex Systems)
Underlying Assumptions in Which Each Mode is Effective
Pillars of Systems Theory
- Hierarchy
- Emergence
- Communication
- Control / Feedback
Hazard Analysis as a Search Method
- Forward Search Methods
- Backward Search Methods
- Bottom-up Search Methods
- Top-down Search Methods
Introduction to Accident Models
Traditional Accident Models in System Safety
- Chain of Events Model
- Failure Propagation Model
- Swiss Cheese Model
- Functional Failure Propagation Model
Brief Overview of Common Traditional Methods
- Failure Modes and Effects Analysis (FMEA)
- Fault Tree Analysis (FTA)
- Functional Hazard Analysis (FHA)
Systems Theory Accident Model (STAMP)
- Principles from Control Theory
- Classifying Causal Factors Using a Control Loop
- The Control Structure
- Common Types of Control Loops in Safety
System-Theoretic Process Analysis (STPA)
- Overview
- Step 1: Define the Purpose of the Analysis
- Step 2: Model the Control Structure
- Step 3: Identify Unsafe Control Actions
- Step 4: Identify Loss Scenarios

System-Theoretic Process Analysis (STPA)

STPA Step 1 Artifacts: Stakeholders Losses, Hazards, Constraints
STPA Step 2 Artifacts: Control Structure, Controllers, Controlled Processes, Control Actions, Feedback, Abstraction
STPA Step 3 Artifacts: Unsafe Control Actions, UCA Bounding, UCA Syntax, Generating Constraints and Requirements, Operator Procedures
STPA Step 4 Artifacts: Loss Scenarios, Causes of UCAs, Control Actions Not Executed or Followed Properly, Human Interactions, Solutions
Real-world Examples From Each Step

STPA Examples and Exercises

Short Interactive STPA Exercises
Short STPA Examples

In-Depth STPA Exercise

Review STPA examples produced by past students
Review instructor feedback and comments provided to past students
STPA Lessons Learned
- Identifying the "system" in the hazard specification
- Deciding what to model as a controller in the control structure
- Determining level of authority / control
- Distinguishing the physical and control layers
- UCA Context: specifying actual state vs. feedback mechanism
- Overlapping UCAs
- Using UCAs to identify new Hazards
- Building Scenarios with Process Model Flaws
- Building Scenarios with Decision Making Flaws
Evaluating the value of STPA: How do the STPA results compare to previous findings?

Lessons Learned in Practice

Identifying Common Mistakes in STPA
Planning and Implementing STPA Projects
- Team Roles
- The STPA Facilitator Role
- Getting leadership buy-in to apply STPA
- STPA Rollout / Roadmap
- STPA Levels of Certification
- Lessons from Successful and Unsuccessful Projects
- STPA Strengths and Weaknesses
- Two-phase Approach to Developing Solutions from STPA Results
- Planning Vector Checks
- Strategies to Accelerate STPA Learning and Rollout
- Time Data from Past STPA Projects
- Breakdown of activities that take the most / least amount of time
- Time Reductions as Experience Accumulates
Results and Conclusions from Past STPA Efforts
Summary

Complete and Continue