ContiMech
Robotics & Automation Engineering
← Back to the blog
Survivability Engineering · For systems that can't just stop

When shutting down is the most expensive thing your system can do

Fail-safe is the right answer almost everywhere. We engineer the systems where it isn't — the ones that have to take a hit and keep doing the job that matters.

2026-06-15 Kyiv Capability brief 6 min read
Beyond fail-safe — survivability engineering for super-critical systems cover

Most reliability engineering ends at a clean shutdown. For some systems, that shutdown is the failure — a scrapped test article, an aborted mission, a safety event, a stop with a lot of zeros on it. We design the alternative: systems that lose a part and keep the mission, on purpose and under control. It's called survivability — and it's a line of engineering we've published on and we ship.

7 peer-reviewed papers Proven on ARM Cortex-M7 ROS 2 / DDS reconfiguration ISO 26262 / 21434 ready

The failure you didn't budget for

Functional safety gives your system one honest answer when something breaks: stop. For most products that's exactly right, and we'd never weaken it. But you already know the parts of your system where it isn't — where "reach the safe state" means the autonomous platform drops mid-task, the moving vehicle loses a function it can't afford to lose, the rig sacrifices the very thing it was protecting.

The safe state exists. Reaching it just costs you the mission. That gap rarely makes it onto a requirements sheet — it shows up later, in the field, in an incident report, in a number nobody wanted to write down.

Mission lostA clean stop that ends the task you were actually there to do.
Asset scrappedAn expensive article under test, abandoned the moment a fault trips a shutdown.
Safety eventAn uncontrolled stop that is itself the hazard — not the cure for it.

What if a fault didn't end the mission?

Survivability flips the goal from fail safely to survive usefully. Instead of treating every function as on-or-lost, we engineer it to degrade — gracefully, predictably, above a floor you define — while the system re-wires around the damage and the controller re-tunes itself to whatever is left.

The mission doesn't fall off a cliff. It walks down a staircase you designed, to an exit it chooses. And nothing here trades away safety: the floor protects the mission, the ceiling is always the safety goal.

The one promise we engineer to

Strip away the jargon and every super-critical design we build holds a single, simple contract — each mission-critical function $i$ stays above its floor $\phi_i$ for the whole window after the fault:

The survivability promise $$ S(t)\ =\ \min_{i\,\in\,\mathcal{C}}\ \frac{f_i(t)}{\phi_i}\ \ge\ 1 $$

One number to watch. $S(t)\ge 1$ means every floor is holding. The instant it's threatened, the system reconfigures harder or begins a controlled exit. Everything else we do exists to keep that inequality true — and to prove, before you ship, that it stays true.

Under the hood — yes, the math is real

This isn't a slide. When the plant changes under a fault — an actuator at half strength, a degree of freedom gone — the controller re-solves its own optimal-control problem against the broken system in real time, a discrete-time Riccati update that re-tunes the feedback gain on the fly:

Adaptive re-tuning, in one line $$ K_\theta\ =\ \left(R + B_\theta^{\top}PB_\theta\right)^{-1}B_\theta^{\top}PA $$

And every reconfigured mode ships with a Lyapunov stability proof, so "keep operating" never quietly becomes "operate into divergence." Then we prove it where it counts — in simulation under fault injection, and on real silicon: an ARM Cortex-M7 closing the loop inside budget, with reconfiguration demonstrated over ROS 2 / DDS. A survivability story that lives only in a deck is a liability; one that runs on the target is something a safety case and a buyer can both stand on.

Adaptive survivability loop diagram
Drawing · adaptive loopDetect the loss, estimate what remains, switch mode, retune the controller, and prove the result is still stable.
Robotic manipulator in a lab used to illustrate reconfiguration and control research
Platform demonstrationMission hardware, not abstract boxes: robotics and embedded platforms are where degraded-mode logic becomes visible and measurable.

Watch a fault chain survive

One example platform, one fault chain, walked down the staircase. Each row: a fault, the reconfiguration it fires, the function that lives because of it.

EventWhat breaksReconfigurationOutcome
NominalBaseline controllerFull performance
Actuator faultOne control authorityRe-allocate effort, re-solve the gainTrajectory held
Sensor driftTrusted state estimateFall back to model-based estimationControl continues
Power cutActuation energyShed non-essentials, defend the coreAbove floor — mission kept
Floor reachedFurther loss unsafeControlled exit to a safe stateGraceful hand-over

Notice what never happens: a jump from "fine" to "gone."

What ContiMech brings to the table

What you get from us

Mode definitionWe turn a vague "what if it breaks?" into specified super-critical requirements you can sign off on.
Survivability architectureAbstraction layers plus a reconfiguration policy that maps every loss to a defended function.
Adaptive control + proofPerformance you can defend in a safety case — each mode carries a stability certificate.
Validation on your targetCortex-M7-class embedded implementation, fault-injection in simulation, ROS 2 / DDS reconfig.
Standards bridgeWired into ISO 26262 functional safety and ISO 21434 cybersecurity where the system needs it.
Honest scoping firstIf fail-safe is enough for you, we'll say so. Survivability is only worth it where stopping costs you.

Is this you?

Most systems should stop at fail-safe — and that's the right call. Survivability earns its budget where a clean stop is itself expensive, dangerous, or mission-ending:

Autonomous & mobile robotics that can't drop a task mid-motion
Automotive functions with no benign off-state while moving
Processes where an uncontrolled stop is the hazard
Critical infrastructure that must ride through faults
Aerospace & defence platforms far from recovery
Test rigs protecting an expensive article under test
Start a conversation

Does your system have a regime where shutting down is the failure?

Send us the system context — what it is, what it must keep doing after a fault, and why a clean stop isn't acceptable. We'll tell you plainly whether survivability is worth it for you, and what the first step looks like. The earlier in the architecture it starts, the cheaper it is.

System type What it is and what it must keep doing
Criticality What the mission is and what a loss costs
Constraints Target hardware, standards, timeline
Backed by peer-reviewed research — see the papers
  1. The Super-Critical Operational Modes in Robotic Systems. Підводні технології / Pidvodni tehnologii (peer-reviewed), 2023.
  2. Established Definitions of Super-Critical Operational Modes as Automotive System Requirements. Підводні технології (peer-reviewed), 2023.
  3. Emphasis on Super-Critical Operational Modes in Robotic Systems. Int. Symposium "Intelligent Solutions — S", 2023.
  4. Cybernetic Approaches to Adaptive Control of Super-Critical Systems. Адаптивні системи автоматичного управління, № 1 (46), 2025 (peer-reviewed).
  5. Post-Failure Reconfiguration of a Super-Critical System via Cyclone DDS and ROS 2 Middleware. XV Int. Conf., Chernihiv, 2025.
  6. Ensuring Survivability of Complex Super-Critical Systems Based on a Hierarchical Abstraction Model and Adaptive Reconfiguration. Smart Technologies: Industrial and Civil Engineering, 4 (17), 2025, pp. 75–82.
  7. Practical Implementation and Simulation of Adaptive Control for Super-Critical Cyber-Physical Systems. Preprint, 2026.

Full list on the publications page.

Fail-safe asks how to stop without harm. Survivability asks how to keep going without harm. If your system needs the second answer, that's what we build.

Survivability Super-Critical Systems Adaptive Control Dynamic Reconfiguration Functional Safety ROS 2 / DDS Robotics