Most reliability engineering ends at a clean shutdown. For some systems, that shutdown is the failure — a scrapped test article, an aborted mission, a safety event, a stop with a lot of zeros on it. We design the alternative: systems that lose a part and keep the mission, on purpose and under control. It's called survivability — and it's a line of engineering we've published on and we ship.
The failure you didn't budget for
Functional safety gives your system one honest answer when something breaks: stop. For most products that's exactly right, and we'd never weaken it. But you already know the parts of your system where it isn't — where "reach the safe state" means the autonomous platform drops mid-task, the moving vehicle loses a function it can't afford to lose, the rig sacrifices the very thing it was protecting.
The safe state exists. Reaching it just costs you the mission. That gap rarely makes it onto a requirements sheet — it shows up later, in the field, in an incident report, in a number nobody wanted to write down.
What if a fault didn't end the mission?
Survivability flips the goal from fail safely to survive usefully. Instead of treating every function as on-or-lost, we engineer it to degrade — gracefully, predictably, above a floor you define — while the system re-wires around the damage and the controller re-tunes itself to whatever is left.
The mission doesn't fall off a cliff. It walks down a staircase you designed, to an exit it chooses. And nothing here trades away safety: the floor protects the mission, the ceiling is always the safety goal.
The one promise we engineer to
Strip away the jargon and every super-critical design we build holds a single, simple contract — each mission-critical function $i$ stays above its floor $\phi_i$ for the whole window after the fault:
One number to watch. $S(t)\ge 1$ means every floor is holding. The instant it's threatened, the system reconfigures harder or begins a controlled exit. Everything else we do exists to keep that inequality true — and to prove, before you ship, that it stays true.
Under the hood — yes, the math is real
This isn't a slide. When the plant changes under a fault — an actuator at half strength, a degree of freedom gone — the controller re-solves its own optimal-control problem against the broken system in real time, a discrete-time Riccati update that re-tunes the feedback gain on the fly:
And every reconfigured mode ships with a Lyapunov stability proof, so "keep operating" never quietly becomes "operate into divergence." Then we prove it where it counts — in simulation under fault injection, and on real silicon: an ARM Cortex-M7 closing the loop inside budget, with reconfiguration demonstrated over ROS 2 / DDS. A survivability story that lives only in a deck is a liability; one that runs on the target is something a safety case and a buyer can both stand on.
Watch a fault chain survive
One example platform, one fault chain, walked down the staircase. Each row: a fault, the reconfiguration it fires, the function that lives because of it.
| Event | What breaks | Reconfiguration | Outcome |
|---|---|---|---|
| Nominal | — | Baseline controller | Full performance |
| Actuator fault | One control authority | Re-allocate effort, re-solve the gain | Trajectory held |
| Sensor drift | Trusted state estimate | Fall back to model-based estimation | Control continues |
| Power cut | Actuation energy | Shed non-essentials, defend the core | Above floor — mission kept |
| Floor reached | Further loss unsafe | Controlled exit to a safe state | Graceful hand-over |
Notice what never happens: a jump from "fine" to "gone."
What you get from us
Is this you?
Most systems should stop at fail-safe — and that's the right call. Survivability earns its budget where a clean stop is itself expensive, dangerous, or mission-ending:
Does your system have a regime where shutting down is the failure?
Send us the system context — what it is, what it must keep doing after a fault, and why a clean stop isn't acceptable. We'll tell you plainly whether survivability is worth it for you, and what the first step looks like. The earlier in the architecture it starts, the cheaper it is.
Backed by peer-reviewed research — see the papers
- The Super-Critical Operational Modes in Robotic Systems. Підводні технології / Pidvodni tehnologii (peer-reviewed), 2023.
- Established Definitions of Super-Critical Operational Modes as Automotive System Requirements. Підводні технології (peer-reviewed), 2023.
- Emphasis on Super-Critical Operational Modes in Robotic Systems. Int. Symposium "Intelligent Solutions — S", 2023.
- Cybernetic Approaches to Adaptive Control of Super-Critical Systems. Адаптивні системи автоматичного управління, № 1 (46), 2025 (peer-reviewed).
- Post-Failure Reconfiguration of a Super-Critical System via Cyclone DDS and ROS 2 Middleware. XV Int. Conf., Chernihiv, 2025.
- Ensuring Survivability of Complex Super-Critical Systems Based on a Hierarchical Abstraction Model and Adaptive Reconfiguration. Smart Technologies: Industrial and Civil Engineering, 4 (17), 2025, pp. 75–82.
- Practical Implementation and Simulation of Adaptive Control for Super-Critical Cyber-Physical Systems. Preprint, 2026.
Full list on the publications page.
Fail-safe asks how to stop without harm. Survivability asks how to keep going without harm. If your system needs the second answer, that's what we build.