Benchmarks

IRIS implements programs from the Computer Language Benchmarks Game (CLBG). All IRIS times were measured on 2026-04-17 on x86-64 Linux. CLBG reference times are published single-core fastest submissions. IRIS currently has two execution tiers: a tree-walking interpreter (iris-stage0) and a self-hosted bytecode compiler (iris-native). Both are measured below.


IRIS Tree-Walker Results (iris-stage0)#

All times measured with iris-stage0 run. Every row is a real measurement – no extrapolations.

BenchmarkInputTimeOutput
binary-treesdepth=100.014s2,096,128
binary-treesdepth=150.030s2,147,450,880
binary-treesdepth=180.150s137,438,691,328
binary-treesdepth=211.114s8,796,090,925,056
spectral-normN=1002.6s1.274…
spectral-normN=50063.8s1.274…
n-bodyN=1,0002.1s-0.142…
n-bodyN=10,00021.1s-0.142…
fannkuchN=71.7s(16, -502)
fannkuchN=817.2s(22, -4720)
fastaN=1,0000.025s-
fastaN=10,0000.167s-
thread-ringN=1,0000.015s-
thread-ringN=100,0000.070s-
thread-ringN=1,000,0000.600s-

IRIS Bytecode VM Results (iris-native)#

iris-native is the self-hosted compiled tier. Times include compilation + execution.

TestInputTime
factorial200.258s
fibonacci300.340s
sum (fold)10,000,0000.444s

The bytecode VM is still early – it currently handles recursive functions and folds but does not yet cover the full CLBG suite. These numbers include compilation time; steady-state execution is faster.


Comparison with CLBG Reference Times#

The only benchmark where IRIS and CLBG share the same input size is binary-trees at depth=21. That comparison is apples-to-apples. For other benchmarks, CLBG standard inputs are much larger than what we measured, so direct comparison is not possible without extrapolation, which we do not do here.

binary-trees, depth=21#

LanguageTime
IRIS (tree-walker)1.114s
OCaml7.78s
Haskell (GHC)12.58s
Racket15.19s
C (gcc)21.21s
CPython 3100.49s

IRIS is 7x faster than OCaml and 19x faster than C on this benchmark. binary-trees tests allocation-heavy workloads (millions of small tree nodes allocated and checksummed). IRIS’s SemanticGraph representation and garbage collector handle this well.

CLBG reference times (standard inputs)#

For context, here are the CLBG standard input sizes and reference times. IRIS was not measured at these inputs.

BenchmarkCLBG InputC (gcc)OCamlHaskellCPython 3
binary-treesdepth=2121.21s7.78s12.58s100.49s
spectral-normN=5,5001.43s5.34s15.99s349.68s
n-bodyN=50,000,0002.10s6.95s6.41s372.41s
fannkuchN=1214.05s45.79s40.21s943.88s
fastaN=25,000,0000.79s3.37s5.46s39.03s

Where IRIS Excels#

Allocation-heavy workloads. binary-trees at depth=21 runs in 1.1s, faster than every CLBG reference language including C. The SemanticGraph runtime is optimized for creating and traversing tree structures – this is what IRIS programs do all day, so the evaluator is tuned for it.

Message passing. thread-ring passes 1 million tokens in 0.6s (600ns/token). The cooperative scheduling model keeps overhead low.

Sequence generation. fasta generates 10,000 nucleotides in 167ms. String building through the evaluator is reasonable for moderate sizes.

Where IRIS Needs Work#

Numerical computation. n-body takes 2.1s for 1,000 steps and 21.1s for 10,000 steps. At the CLBG standard of 50 million steps, this would be extremely slow. The tree-walker evaluates every floating-point operation by walking an AST node – there is no register allocation, no SIMD, no loop unrolling. Each step costs roughly 2ms through the interpreter versus ~42ns in compiled C.

Dense matrix operations. spectral-norm at N=500 takes 63.8s. The O(N^2) inner loops go through the same tree-walking overhead as n-body. At the CLBG standard N=5,500 this would be prohibitive.

Permutation-heavy algorithms. fannkuch at N=8 takes 17.2s. The O(N!) permutation enumeration compounds the per-operation overhead of tree-walking.

The common thread: any benchmark dominated by tight arithmetic loops is slow under the tree-walker. This is expected – the tree-walker was built for correctness and self-hosting, not numerical throughput. The bytecode compiler (iris-native) is the path to closing this gap.


Execution Tiers#

IRIS has three execution tiers at different stages of maturity:

TierEngineStatusBest for
Tree-walkeriris-stage0Complete (runs all 243 .iris files)Allocation, message passing, general programs
Bytecode VMiris-nativePartial (recursion, folds, basic arithmetic)Compute-bound tasks, tight loops
Native AOTaot_compile.irisExperimental (fold + arithmetic only)Sub-nanosecond inner loops

The tree-walker is the production tier – it runs everything. The bytecode VM compiles IRIS programs to a stack-based bytecode and executes them in a virtual machine, eliminating AST traversal overhead. Native AOT generates x86-64 machine code directly and achieves C-class performance on supported patterns (sub-nanosecond fold iterations), but currently covers only a narrow subset of the language.

The goal is for iris-native to cover the full language, at which point the numerical benchmarks will improve by orders of magnitude.


Running Benchmarks#

# Single benchmark (tree-walker)
bootstrap/iris-stage0 run benchmark/binary-trees/binary-trees.iris 21

# Thread-ring
bootstrap/iris-stage0 run benchmark/thread-ring/thread-ring.iris 1000000

# iris-native (bytecode VM)
bootstrap/iris-native run benchmark/factorial.iris 20

# Full suite
./benchmark/run_all.sh