Implementing End-to-End Safety Features in Automotive SoCs
[The following is an excerpt from NetSpeed’s a new white paper, "Intelligent Interconnect Blends Performance, Sensing and Safety in Automotive SoCs" - to read the entire article at the NetSpeed Web, click in the 'Automotive Solutions' whitepaper download HERE].
A functionally safe system implementation is able to detect, correct, and withstand errors with uninterrupted service. Early approaches to safety protected the memory subsystem from data errors. As systems have moved from CPU-centric to heterogeneous architectures, the need for a more end-to-end protection strategy emerged.
An end-to-end safety approach views the system holistically, protecting data as it migrates throughout a system instead of just in and around the memory subsystem. Several types of protection are designed into the NetSpeed technology to help produce a FuSa implementation. At a high level, protection falls into three broad categories: transport, logic protection and redundancy, and timeouts.
Transport ECC and Parity
Protecting the SoC data transport is critical to end-to-end safety. Errors occurring in a NoC can propagate quickly to multiple consumers across a heterogeneous system, and undetected they can lead to rapid chaos and failure. The solution for data errors is well-established science: error correcting codes can correct single bit errors and detect double bit errors.
The NetSpeed NoC IP implements a customized error correcting algorithm operating on a Hamming code with an additional parity bit. Additional bits are added to the NoC datapath and directory RAM arrays to carry ECC information. At various points in the NoC, ECC values are generated and checked to verify data integrity. Should an error arise, single bit errors are handled transparently in logic, while double bit errors are marked as detected but uncorrected.
Recognizing that a full ECC implementation takes power and area, NetSpeed NoC IP allows granular configuration of ECC. By default, ECC is created at the NoC ingress point, and checked at egress points. For the highest level of integrity and area usage, ECC can be added on a per-hop basis—providing h2h protection. For lower levels of integrity with area savings, users can fall back to parity protection instead of ECC, with similar granularity and coverage in a simpler implementation.
h2h protection checks integrity of complete packets flowing on the NoC, with a checksum on the entire payload and bit interleaved parity and flit identifiers for further protection. Control fields such as routing information are parity protected on a h2h basis. Both data and sideband are fully protectable, with user configuration allowing full tradeoffs of protection, area, and performance.
Logic Protection and Redundancy
Packets are verified in the NoC transport scheme, but end-to-end resilience calls for also protecting the logic which frames transactions at the ingress and egress points.
Bridge logic structures can be optionally protected with parity at a relatively low cost in area. Buffers, registers, flip-flop arrays, and other logic with parity protection provide fault detection.
For higher integrity, bridge logic can be duplicated with separate clock and reset inputs and a one clock delay on the redundant unit to avoid glitches. Routes between end points can also be duplicated, with software control able to swap routing if one route is compromised. Compound bridges can also be added with equivalence checking for full lock-step operation.
In a real-time system, data can be intact but significantly delayed, and the result would be just as much of a problem as an outright data error. High-resolution counters, programmable timestamps, and maskable interrupts help track and report on three types of timeouts.
TARGET TIMEOUTS monitor unresponsive slave devices. When an expected response takes too long, a dummy error response can be auto-generated back to the initiator allowing release of reserved resources and recovery of the NoC.
INITIATOR TIMEOUTS focus on requests that may be dropped or stuck in the NoC. Intervals are fully programmable.
NOC TIMEOUTS are a system safeguard, detecting congestion and backup in the NoC that may be blocking traffic. Responses to these events can be configured several ways, such as dropping requests or waiving destination responses, and raising fatal interrupts for software intervention.
Logging and Reporting
A centralized fault controller completes the end-to-end FuSa implementation. All interrupts, error reports, and alarms anywhere from the NoC come to the fault controller, allowing software visibility for all anomalies.
Download NetSpeed’s Automotive Solutions White Paper
Learn more about how NetSpeed has brought performance, sensing, and safety together for ISO 26262 applications with a fully automated, configurable, correct-by-construction approach where a NoC is optimized with its FuSa features. Download the full white paper at http://netspeedsystems.com/white-paper/ (click on the 'Automotive Solutions' download).