Architecture

IRIS programs are compiled from source to bytecodes and executed by a native x86-64 VM. The entire compilation pipeline is written in IRIS and compiles itself – a verified fixed point.

Compilation Pipeline#

source.iris
    |
    v
tokenizer.iris      Lexes source into tokens
    |
    v
iris_parser.iris    Parses tokens into an AST
    |
    v
ast_compile.iris    Compiles AST to flat bytecodes
    |
    v
native_vm.iris      Hand-assembled x86-64 VM executes bytecodes

Each stage is a .iris program. The tokenizer, parser, and AST compiler are compiled to bytecodes by the same AST compiler, then executed by the same native VM. This is how the compiler compiles itself.

Tokenizer#

src/iris-programs/syntax/tokenizer.iris – Lexes source text into a token stream. Recognizes keywords (let, if, then, else, match, with, import, as, type, rec), operators, identifiers, integers, strings, and comments (--).

Parser#

src/iris-programs/syntax/iris_parser.iris – Recursive-descent parser that produces an AST. Handles let/let rec bindings, lambda expressions, if/then/else, match/with pattern matching, function application, infix operators, tuples, imports, and ADT definitions.

AST Compiler#

src/iris-programs/compiler/ast_compile_single.iris – Compiles an AST to a flat tuple of bytecodes. Handles multi-binding modules by wrapping prior declarations as Let/in bindings. Supports recursive functions, lambda inlining, multi-parameter functions, and scope tracking.

Native VM#

src/iris-programs/compiler/native_vm.iris (~960 lines) – A bytecode interpreter written as hand-assembled x86-64 machine code, emitted as byte literals from IRIS code.

Registers#

RegisterRole
r12Program counter (index into bytecode)
r13Value stack pointer (grows downward)
r14Bytecode tuple pointer
r15Heap bump allocator
rbxLocals array pointer

Opcodes#

The VM implements 30+ opcodes:

CodeNameCodeName
0HALT16JMP
1PUSH17JZ
2ADD18MAKE_TUPLE
3SUB19TUPLE_GET
4MUL21TUPLE_LEN
5DIV22LIST_APPEND
6MOD23BITAND
7NEG24SHR
8EQ25FOLD_BEGIN
9LT26FOLD_END
10GT27LIST_RANGE
11NE29PUSH_STR_PTR
12LE30STR_LEN
13GE31CHAR_AT
14LOAD32STR_CONCAT
15STORE33STR_SLICE
34LIST_CONCAT
39FILE_READ
40DEBUG_PRINT

The dispatch loop loads an opcode from bytecode[r12], walks a compare-and-jump chain, executes the handler, and jumps back to the loop top.

Memory Layout#

The VM uses a fixed stack frame:

  • Value stack ([rbp-256..rbp-512]) – operand stack, grows downward
  • Locals ([rbp-768..rbp-512]) – 32 local variable slots (8 bytes each)
  • Scratch slots ([rbp-136..rbp-176]) – temporary storage for string/file ops
  • Heap – bump-allocated via r15, used for tuples and strings

Tuples and strings share a tagged-pointer format. Strings use tag 1 with bytes packed after an 8-byte header.

Self-Hosting#

The compiler compiles itself through this loop:

  1. iris-native loads the tokenizer, parser, and AST compiler as .iris source
  2. Each stage is compiled to bytecodes by ast_compile_single.iris
  3. The bytecodes are executed by native_vm.iris (which is itself compiled the same way)
  4. The output is a new iris-native binary

The bootstrap/build-native-self script automates this. The only non-IRIS dependency is the ELF stub template (frozen x86 machine code for the startup sequence). Everything else – tokenization, parsing, compilation, VM execution – is IRIS compiling IRIS.

Bootstrap Chain#

iris-stage0 (frozen seed)
    |  compiles + runs
    v
iris-native (self-hosted compiler + VM)
    |  compiles itself
    v
iris-native' (reproduced binary -- fixed point)

iris-stage0 is the frozen bootstrap binary. It contains a tree-walking evaluator and is used only to bootstrap the first iris-native. After that, iris-native can reproduce itself.

SemanticGraph#

The SemanticGraph is the canonical program representation used by iris-stage0. It is a typed DAG with 20 node kinds:

TagKindTagKind
0x00Prim0x0AEffect
0x01Apply0x0BTuple
0x02Lambda0x0CInject
0x03Let0x0DProject
0x04Match0x0ETypeAbst
0x05Lit0x0FTypeApp
0x06Ref0x10LetRec
0x07Neural0x11Guard
0x08Fold0x12Rewrite
0x09Unfold0x13Extern

Nodes are content-addressed via BLAKE3-truncated 64-bit IDs. Identical subgraphs share the same ID automatically.

The native compilation pipeline (iris-native) works with bytecodes directly and does not use SemanticGraph at runtime. SemanticGraph remains the format for iris-stage0 commands (compile, run, direct, interp).

Four-Layer Model#

IRIS has additional capabilities organized into four layers:

L0  Evolution      Population search, mutation, selection
L1  Semantics      SemanticGraph (20 node kinds, BLAKE3 content-addressed)
L2  Verification   LCF proof kernel (20 inference rules, Lean 4)
L3  Hardware       Native x86-64 VM, iris-stage0 evaluator

L0 – Evolution. Programs can be evolved through multi-objective genetic search (NSGA-II, lexicase selection, novelty search) with 16 mutation operators. See src/iris-programs/evolution/.

L2 – Verification. An LCF-style proof kernel implements CaCIC (Cost-aware Calculus of Inductive Constructions) with 20 inference rules, formalized in Lean 4. Runs as an IPC subprocess. See Verification.

L3 – Hardware. The execution layer: native_vm.iris for compiled programs, iris-stage0 for interpreted evaluation, and effect dispatch for I/O.