Predictable Software Integration on Heterogeneous MultiProcessor System on Chip
The widespread adoption of artificial-intelligence (AI) algorithms poses new challenges for all those computation environments that require both performance and real-time assurance. Autonomous driving, intelligent robots and drones have driven the need to reduce weight and power while improving both performance and real-time characteristics of the workloads.
These augmented CPS systems generate increasing volumes of real-time (e.g., imaging) data flows causing the hardware memory hierarchy (the DRAM and the cache hierarchy, especially the last level cache shared among multiple cores) to become a bottleneck resource and a source of temporal unpredictability.
Furthermore, new AI algorithms require to execute tasks on highly specialized accelerators such as GPUs or FPGAs. This has lead to the rapid adoption of heterogeneous multiprocessor system on chip (MPSoC) platforms (e.g., Xilinx Ultrascale+, NVIDIA Xavier) that offer different types of heterogeneous multicore CPU clusters as well as GPU and/or FPGA.
From the safety and real-time perspectives, these platforms pose unprecedented challenges.
Much of the real-time scheduling theory in the past two decades was based on the assumption that we can compute the Worst Case Execution Time (WCET) of each task when it is executing in isolation. When tasks are executing together, scheduling theory would compute the worst case response time as a function of the run-alone WCETs. Unfortunately, on MPSoCs, this fundamental assumption is not even true in an approximate sense, leading to costly and sub-optimal integration phases for complex real-time embedded software.
Furthermore, MPSoC platforms pose new challenges for the management of the memory hierarchy since real-time tasks may not only compete with each other at cache, interconnect and DRAM level to access their memory resources, but may also be subject to memory interference due to parallel workloads executing on different hardware accelerators (e.g., GPUs or TPUs, to name a few).
On these complex platforms, state-of-the-art practice in safety critical embedded industry is to disable all but one core. While some ad-hoc isolation solutions have been proposed to use more cores of a chip when running safety critical software, there is no publicly available and validated procedure to assess the quality of proposed isolation methods or evaluate the safety and real-time performance of the integrated hardware-software system. Transformational research by the academic community in collaboration with certification authorities and industry is urgently needed to overcome present challenges faced by safety critical embedded industry.
As an example, certifiable resource isolation technology for MPSoCs will be imperative to guarantee the successful integration, verification, and testing of fully autonomous driving systems, smart manufacturing plants driven by AI, and smart-city's autonomous drones.
Current research focuses on restoring isolation and temporal predictability of safety critical software running on latest generation MPSoCs. We target hardware-software co-design techniques that can be deployed at the lowest levels (hardware, hypervisor, OS) of the MPSoCs, as well as novel task execution models and optimization techniques to maximize the performance of these platforms without sacrificing the predictability of real-time workloads.
We aim to work closely with leaders in manufacturing, automotive, and avionics industries, bringing together the real-time academic community, the embedded industry, and the certification authorities for electronic hardware and software systems to tackle an industry-wide challenge: how to design, integrate, analyze, and certify safety-critical real-time software that runs on heterogeneous multiprocessor system on chip.