L4 performance

The tables below show the message-passing IPC costs (half of one round trip) for various CPUs and L4 configurations. Times are clock cycles spent in the kernel (including the overhead for entering and exiting kernel mode). Times are given for various payload sizes (number of message registers used, each is one machine word).

These results are from our L4 version called NICTA::Pistachio-embedded and our para-virtualized Linux called Wombat. Both systems are no longer supported, as they have been superseded by OKL4 and OK Linux from Open Kernel Labs. The OK Labs systems have been developed further, including further improvements to the performance.

NICTA::Pistachio-embedded IPC

ARM (5 physical message registers)

These numbers are for IPC between separate address spaces (including context switch).

Processor Speed 0..4 MRs 8 MRs 16 MRs
XScale PXA255 (ARMv5) 400Mhz 151 188 228
StrongARM SA1100 (ARMv4) 206Mhz 131 141 161

Fast address-space switching (FASS, aka FCSE) is enabled.


Wombat: para-virtualised Linux

Wombat, our architecture-independent para-virtualised Linux for L4-embedded, runs on ARM, x86 and MIPS. On ARM v4 or v5 processors, such as ARM9 cores or the XScale, Wombat benefits from the fast address-space switch (FASS) technology implemented in L4-embedded, while this is not supported in native Linux distributions.

Wombat ARM: XScale PXA255 @ 200Mhz

These numbers are LMBench results for native ARM Linux and Wombat on the PLEB2 reference platform.



LMBench latency test results. Times are in microseconds. Lower is better.

Latencies Linux Wombat Rel. Perf Comment
lat_ctx -s 0 2 190.8 6.48 29.44 Context switch latencies
lat_ctx -s 0 3 197.1 18.82 10.47  
lat_ctx -s 0 4 199.5 19.78 10.09  
lat_ctx -s 0 10 215.7 44.07 4.89  
lat_ctx -s 4 2 257.7 7.15 36.04  
lat_ctx -s 4 3 259.3 23.26 11.15  
lat_ctx -s 4 4 293.4 40.28 7.28  
lat_ctx -s 4 4 285.1 141.96 2.01  
lat_fifo 377.0 80.07 4.71 Hot potato
lat_pipe 378.4 81.56 4.64  
lat_unix 764.5 107.48 7.11  
lat_syscall null 0.82 4.73 0.17  
lat_proc procedure 0.21 0.21 1.00 Process creation
lat_proc fork 4334 5706 0.76  
lat_proc exec 4600 6400 0.72  


LMBench bandwidth test results, in MB/s. Higher is better.

Bandwidth Linux Wombat Relative Performance
bw_file_rd 1024 io_only 39.38 12.43 0.32
bw_mmap_rd 1024 106.7 106.1 0.99
bw_mem 1024 416.0 416.0 1.00
bw_mem_wr 229.9 229.0 1.00
bw_pipe 10.15 15.31 1.51
bw_unix 24.23 11.32 0.47


The following numbers are the results for the AIM7 benchmark on native Linux and Wombat. Units are in Jobs/min/task. Higher is better.

Metric Linux Wombat Relative Performance
1 Task 47.52 46.32 0.97
2 Tasks 24.77 24.12 0.97
3 Tasks 16.74 16.31 0.97

Despite the increased security and isolation benefits of virtualizing Linux, virtualized Linux on ARM shows clear performance gains over native Linux in many areas, particularily due to its use of fast address-space switching.