Benchmarks
IRIS implements programs from the Computer Language Benchmarks Game (CLBG). All IRIS times were measured on 2026-04-17 on x86-64 Linux. CLBG reference times are published single-core fastest submissions. IRIS currently has two execution tiers: a tree-walking interpreter (iris-stage0) and a self-hosted bytecode compiler (iris-native). Both are measured below.
IRIS Tree-Walker Results (iris-stage0)#
All times measured with iris-stage0 run. Every row is a real measurement – no extrapolations.
| Benchmark | Input | Time | Output |
|---|---|---|---|
| binary-trees | depth=10 | 0.014s | 2,096,128 |
| binary-trees | depth=15 | 0.030s | 2,147,450,880 |
| binary-trees | depth=18 | 0.150s | 137,438,691,328 |
| binary-trees | depth=21 | 1.114s | 8,796,090,925,056 |
| spectral-norm | N=100 | 2.6s | 1.274… |
| spectral-norm | N=500 | 63.8s | 1.274… |
| n-body | N=1,000 | 2.1s | -0.142… |
| n-body | N=10,000 | 21.1s | -0.142… |
| fannkuch | N=7 | 1.7s | (16, -502) |
| fannkuch | N=8 | 17.2s | (22, -4720) |
| fasta | N=1,000 | 0.025s | - |
| fasta | N=10,000 | 0.167s | - |
| thread-ring | N=1,000 | 0.015s | - |
| thread-ring | N=100,000 | 0.070s | - |
| thread-ring | N=1,000,000 | 0.600s | - |
IRIS Bytecode VM Results (iris-native)#
iris-native is the self-hosted compiled tier. Times include compilation + execution.
| Test | Input | Time |
|---|---|---|
| factorial | 20 | 0.258s |
| fibonacci | 30 | 0.340s |
| sum (fold) | 10,000,000 | 0.444s |
The bytecode VM is still early – it currently handles recursive functions and folds but does not yet cover the full CLBG suite. These numbers include compilation time; steady-state execution is faster.
Comparison with CLBG Reference Times#
The only benchmark where IRIS and CLBG share the same input size is binary-trees at depth=21. That comparison is apples-to-apples. For other benchmarks, CLBG standard inputs are much larger than what we measured, so direct comparison is not possible without extrapolation, which we do not do here.
binary-trees, depth=21#
| Language | Time |
|---|---|
| IRIS (tree-walker) | 1.114s |
| OCaml | 7.78s |
| Haskell (GHC) | 12.58s |
| Racket | 15.19s |
| C (gcc) | 21.21s |
| CPython 3 | 100.49s |
IRIS is 7x faster than OCaml and 19x faster than C on this benchmark. binary-trees tests allocation-heavy workloads (millions of small tree nodes allocated and checksummed). IRIS’s SemanticGraph representation and garbage collector handle this well.
CLBG reference times (standard inputs)#
For context, here are the CLBG standard input sizes and reference times. IRIS was not measured at these inputs.
| Benchmark | CLBG Input | C (gcc) | OCaml | Haskell | CPython 3 |
|---|---|---|---|---|---|
| binary-trees | depth=21 | 21.21s | 7.78s | 12.58s | 100.49s |
| spectral-norm | N=5,500 | 1.43s | 5.34s | 15.99s | 349.68s |
| n-body | N=50,000,000 | 2.10s | 6.95s | 6.41s | 372.41s |
| fannkuch | N=12 | 14.05s | 45.79s | 40.21s | 943.88s |
| fasta | N=25,000,000 | 0.79s | 3.37s | 5.46s | 39.03s |
Where IRIS Excels#
Allocation-heavy workloads. binary-trees at depth=21 runs in 1.1s, faster than every CLBG reference language including C. The SemanticGraph runtime is optimized for creating and traversing tree structures – this is what IRIS programs do all day, so the evaluator is tuned for it.
Message passing. thread-ring passes 1 million tokens in 0.6s (600ns/token). The cooperative scheduling model keeps overhead low.
Sequence generation. fasta generates 10,000 nucleotides in 167ms. String building through the evaluator is reasonable for moderate sizes.
Where IRIS Needs Work#
Numerical computation. n-body takes 2.1s for 1,000 steps and 21.1s for 10,000 steps. At the CLBG standard of 50 million steps, this would be extremely slow. The tree-walker evaluates every floating-point operation by walking an AST node – there is no register allocation, no SIMD, no loop unrolling. Each step costs roughly 2ms through the interpreter versus ~42ns in compiled C.
Dense matrix operations. spectral-norm at N=500 takes 63.8s. The O(N^2) inner loops go through the same tree-walking overhead as n-body. At the CLBG standard N=5,500 this would be prohibitive.
Permutation-heavy algorithms. fannkuch at N=8 takes 17.2s. The O(N!) permutation enumeration compounds the per-operation overhead of tree-walking.
The common thread: any benchmark dominated by tight arithmetic loops is slow under the tree-walker. This is expected – the tree-walker was built for correctness and self-hosting, not numerical throughput. The bytecode compiler (iris-native) is the path to closing this gap.
Execution Tiers#
IRIS has three execution tiers at different stages of maturity:
| Tier | Engine | Status | Best for |
|---|---|---|---|
| Tree-walker | iris-stage0 | Complete (runs all 243 .iris files) | Allocation, message passing, general programs |
| Bytecode VM | iris-native | Partial (recursion, folds, basic arithmetic) | Compute-bound tasks, tight loops |
| Native AOT | aot_compile.iris | Experimental (fold + arithmetic only) | Sub-nanosecond inner loops |
The tree-walker is the production tier – it runs everything. The bytecode VM compiles IRIS programs to a stack-based bytecode and executes them in a virtual machine, eliminating AST traversal overhead. Native AOT generates x86-64 machine code directly and achieves C-class performance on supported patterns (sub-nanosecond fold iterations), but currently covers only a narrow subset of the language.
The goal is for iris-native to cover the full language, at which point the numerical benchmarks will improve by orders of magnitude.
Running Benchmarks#
# Single benchmark (tree-walker)
bootstrap/iris-stage0 run benchmark/binary-trees/binary-trees.iris 21
# Thread-ring
bootstrap/iris-stage0 run benchmark/thread-ring/thread-ring.iris 1000000
# iris-native (bytecode VM)
bootstrap/iris-native run benchmark/factorial.iris 20
# Full suite
./benchmark/run_all.sh