Skip to content

Chapter 3: Register-Based VM

In Chapter 2, we built persistent storage for accounts and blocks. Now we need something to actually execute smart contract code. That’s the virtual machine — the computational engine at the heart of our blockchain.

By the end of this chapter, you’ll have implemented:

ModulePurpose
opcodes.rsInstruction set definition (43 opcodes)
memory.rsRegisters and linear memory
gas.rsGas metering and cost tables
executor.rsThe fetch-decode-execute loop
tracer.rsExecution tracing for debugging

A virtual machine (VM) is a software implementation of a computer. Instead of running machine code on physical silicon, it interprets bytecode — a compact binary representation of instructions.

Physical CPUBlockchain VM
Transistors on siliconRust code
x86/ARM machine codeCustom bytecode
Registers (RAX, RBX, etc.)Virtual registers (R0-R15)
RAMLinear memory buffer
Disk I/OStorage (SLOAD/SSTORE)
Clock cyclesGas units

There are two main VM designs:

Operations pop operands from a stack and push results back:

PUSH 3 ; stack: [3]
PUSH 4 ; stack: [3, 4]
ADD ; pops 3 and 4, pushes 7 → stack: [7]

Pros: Simple to implement, compact bytecode Cons: Lots of stack manipulation, harder to optimize

Operations specify source and destination registers explicitly:

LOAD R0, 3 ; R0 = 3
LOAD R1, 4 ; R1 = 4
ADD R2, R0, R1 ; R2 = R0 + R1 = 7

Pros: Fewer instructions, easier to optimize, maps well to real CPUs Cons: Larger instruction encoding (need register numbers)

Historical Context:

Most VMs fall into two categories:

ArchitectureExamplesAdvantagesDisadvantages
Stack-BasedEVM, JVM (early), WebAssemblyCompact bytecode, simple to compile toMore instructions, harder to optimize
Register-BasedLua VM, Android ART, x86/ARM CPUsFewer instructions, easier to optimizeLarger bytecode (register operands)

Stack-Based Example (EVM):

; Compute (a + b) * c
PUSH a ; Stack: [a]
PUSH b ; Stack: [a, b]
ADD ; Stack: [a+b]
PUSH c ; Stack: [a+b, c]
MUL ; Stack: [(a+b)*c]
Total: 5 instructions

Register-Based Example (Minichain):

; Compute (a + b) * c
LOADI R0, a ; R0 = a
LOADI R1, b ; R1 = b
ADD R2, R0, R1 ; R2 = a + b
LOADI R3, c ; R3 = c
MUL R4, R2, R3 ; R4 = (a+b) * c
Total: 5 instructions (same), but clearer semantics

Why Register-Based Wins for Learning:

  1. Explicit Data Flow:

    • Stack: “Where is the value? Top? Second from top?”
    • Registers: “Value is in R2, explicitly named”
  2. Easier to Debug:

Stack VM trace: Register VM trace:
PUSH 10 LOADI R0, 10
PUSH 20 LOADI R1, 20
ADD ADD R2, R0, R1
Stack: [30] R0=10, R1=20, R2=30 ← Clear!
  1. Matches Physical Hardware:

    • CPUs have registers (x86: RAX, RBX, etc.)
    • Understanding register-based VMs → easier to understand real CPUs
  2. Optimization Opportunities:

    • Register allocation is a well-studied problem
    • Easier to implement JIT compilation
    • Less stack manipulation overhead

Why EVM Chose Stack-Based?

Performance Comparison:

OperationStack VMRegister VMWinner
Simple ADD3 instructions (PUSH, PUSH, ADD)3 instructions (LOADI, LOADI, ADD)Tie
Complex expressionMore stack juggling (DUP, SWAP)Direct register accessRegister
Function callsStack frame managementExplicit register saveRegister
DebuggingOpaque stack stateVisible register stateRegister

Real-World Example:

Expression: result = (a * b) + (c * d)
Stack VM (EVM):
PUSH a ; [a]
PUSH b ; [a, b]
MUL ; [a*b]
PUSH c ; [a*b, c]
PUSH d ; [a*b, c, d]
MUL ; [a*b, c*d]
ADD ; [(a*b)+(c*d)]
7 instructions, max stack depth: 3
Register VM (Minichain):
LOADI R0, a ; R0 = a
LOADI R1, b ; R1 = b
MUL R2, R0, R1 ; R2 = a * b
LOADI R3, c ; R3 = c
LOADI R4, d ; R4 = d
MUL R5, R3, R4 ; R5 = c * d
ADD R6, R2, R5 ; R6 = R2 + R5
7 instructions, 7 registers used (R0-R6)

Both use same number of instructions, but register version:

  • Intermediate values (ab, cd) have explicit homes (R2, R5)
  • No stack depth tracking needed
  • Easier to parallelize (R2 and R5 independent)

Why 16 Registers?

Register CountTrade-off
8 registersCompact encoding (3 bits), but frequent spills to memory
16 registers4-bit encoding (good fit in bytes), enough for most expressions
32 registersMore flexibility, but wastes encoding space

x86-64 has 16 general-purpose registers. We match that convention.


Our VM has three main state components:

┌─────────────────────────────────────────────────────────────┐
│ VM State │
├─────────────────┬──────────────────┬────────────────────────┤
│ Registers │ Execution Ctx │ Memory │
│ R0-R15 │ caller/address │ (linear buffer) │
│ 64-bit each │ value/block/ts │ │
├─────────────────┴──────────────────┴────────────────────────┤
│ Program Counter (PC) │
│ Gas Remaining │
├─────────────────────────────────────────────────────────────┤
│ Bytecode (read-only) │
└─────────────────────────────────────────────────────────────┘
│ SLOAD/SSTORE
┌─────────────────┐
│ Contract │
│ Storage │
│ (from Ch. 2) │
└─────────────────┘

We have 16 general-purpose registers (R0-R15), each holding a 64-bit unsigned integer:

pub const NUM_REGISTERS: usize = 16;
pub struct Registers {
values: [u64; NUM_REGISTERS],
}
impl Registers {
pub fn new() -> Self {
Self { values: [0; NUM_REGISTERS] }
}
pub fn get(&self, r: u8) -> u64 {
self.values[r as usize % NUM_REGISTERS]
}
pub fn set(&mut self, r: u8, value: u64) {
self.values[r as usize % NUM_REGISTERS] = value;
}
}

Linear memory is a byte-addressable buffer that grows on demand:

pub struct Memory {
data: Vec<u8>,
max_size: usize,
}
impl Memory {
pub fn new(max_size: usize) -> Self {
Self {
data: Vec::new(),
max_size,
}
}
/// Read a byte from memory.
pub fn load8(&self, offset: u32) -> u8 {
self.data.get(offset as usize).copied().unwrap_or(0)
}
/// Write a byte to memory, growing if needed.
pub fn store8(&mut self, offset: u32, value: u8) -> Result<(), VmError> {
let offset = offset as usize;
if offset >= self.max_size {
return Err(VmError::MemoryOverflow);
}
if offset >= self.data.len() {
self.data.resize(offset + 1, 0);
}
self.data[offset] = value;
Ok(())
}
}

The program counter (PC) points to the next instruction to execute:

pub struct Vm {
pub registers: Registers,
pub memory: Memory,
pub pc: usize, // Program counter
pub gas_remaining: u64, // Gas left
pub bytecode: Vec<u8>, // The contract code
pub halted: bool, // Has execution stopped?
}

Each instruction is encoded as a sequence of bytes. The first byte is the opcode, followed by operands.

We use three main formats:

FormatBytesExampleDescription
R1 + 1HALTNo operands
RR1 + 1NOT R0One register
RRR1 + 2ADD R0, R1, R2Two or three registers
RI1 + 1 + 8LOADI R0, 12345Register + immediate value
ADD instruction (3 bytes):
┌────────┬────────┬────────┐
│ opcode │ dst|s1 │ s2|pad │
│ 0x01 │ 0x01 │ 0x20 │
└────────┴────────┴────────┘
R0 R1 R2
dst = destination register (4 bits)
s1 = source register 1 (4 bits)
s2 = source register 2 (4 bits)
pad = padding (4 bits)

We organize opcodes into categories:

RangeCategoryExamples
0x00-0x0FControlHALT, NOP, JUMP, CALL, RET
0x10-0x1FArithmeticADD, SUB, MUL, DIV, MOD
0x20-0x2FBitwiseAND, OR, XOR, NOT, SHL, SHR
0x30-0x3FComparisonEQ, NE, LT, GT, LE, GE
0x40-0x4FMemoryLOAD8, LOAD64, STORE8, STORE64
0x50-0x5FStorageSLOAD, SSTORE
0x70-0x7FImmediateLOADI, MOV
0x80-0x8FContextCALLER, CALLVALUE, ADDRESS
0xF0-0xFFDebugLOG

Our VM uses a 43-instruction set organized into 9 categories: control flow, arithmetic, bitwise, comparison, memory, storage, immediate, context, and debug operations.

Our instruction set is organized by opcode ranges:

RangeCategoryPurposeExample Instructions
0x00-0x0FControl FlowProgram controlHALT, JUMP, JUMPI, CALL, RET
0x10-0x1FArithmeticMath operationsADD, SUB, MUL, DIV, MOD
0x20-0x2FBitwiseBit manipulationAND, OR, XOR, NOT, SHL, SHR
0x30-0x3FComparisonValue comparisonEQ, LT, GT, LE, GE, ISZERO
0x40-0x4FMemoryRAM operations (temp)LOAD8, LOAD64, STORE8, STORE64
0x50-0x5FStorageDisk operations (persistent)SLOAD, SSTORE
0x70-0x7FImmediateConstants & data movementLOADI, MOV
0x80-0x8FContextExecution environmentCALLER, ADDRESS, TIMESTAMP
0xF0-0xFFDebugDebugging & loggingLOG
crates/vm/src/opcodes.rs
/// All VM opcodes.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[repr(u8)]
pub enum Opcode {
// Control Flow (0x00-0x0F)
HALT = 0x00,
NOP = 0x01,
JUMP = 0x02,
JUMPI = 0x03,
CALL = 0x04,
RET = 0x05,
REVERT = 0x0F,
// Arithmetic (0x10-0x1F)
ADD = 0x10,
SUB = 0x11,
MUL = 0x12,
DIV = 0x13,
MOD = 0x14,
ADDI = 0x15,
// Bitwise (0x20-0x2F)
AND = 0x20,
OR = 0x21,
XOR = 0x22,
NOT = 0x23,
SHL = 0x24,
SHR = 0x25,
// Comparison (0x30-0x3F)
EQ = 0x30,
NE = 0x31,
LT = 0x32,
GT = 0x33,
LE = 0x34,
GE = 0x35,
ISZERO = 0x36,
// Memory - RAM (0x40-0x4F)
LOAD8 = 0x40,
LOAD64 = 0x41,
STORE8 = 0x42,
STORE64 = 0x43,
MSIZE = 0x44,
MCOPY = 0x45,
// Storage - Disk (0x50-0x5F)
SLOAD = 0x50,
SSTORE = 0x51,
// Immediate (0x70-0x7F)
LOADI = 0x70,
MOV = 0x71,
// Context (0x80-0x8F)
CALLER = 0x80,
CALLVALUE = 0x81,
ADDRESS = 0x82,
BLOCKNUMBER = 0x83,
TIMESTAMP = 0x84,
GAS = 0x85,
// Debug (0xF0-0xFF)
LOG = 0xF0,
}
impl Opcode {
/// Parse a byte as an opcode.
pub fn from_byte(byte: u8) -> Option<Self> {
match byte {
0x00 => Some(Opcode::HALT),
0x01 => Some(Opcode::NOP),
0x02 => Some(Opcode::JUMP),
0x03 => Some(Opcode::JUMPI),
0x04 => Some(Opcode::CALL),
0x05 => Some(Opcode::RET),
0x0F => Some(Opcode::REVERT),
0x10 => Some(Opcode::ADD),
0x11 => Some(Opcode::SUB),
0x12 => Some(Opcode::MUL),
0x13 => Some(Opcode::DIV),
0x14 => Some(Opcode::MOD),
0x15 => Some(Opcode::ADDI),
0x20 => Some(Opcode::AND),
0x21 => Some(Opcode::OR),
0x22 => Some(Opcode::XOR),
0x23 => Some(Opcode::NOT),
0x24 => Some(Opcode::SHL),
0x25 => Some(Opcode::SHR),
0x30 => Some(Opcode::EQ),
0x31 => Some(Opcode::NE),
0x32 => Some(Opcode::LT),
0x33 => Some(Opcode::GT),
0x34 => Some(Opcode::LE),
0x35 => Some(Opcode::GE),
0x36 => Some(Opcode::ISZERO),
0x40 => Some(Opcode::LOAD8),
0x41 => Some(Opcode::LOAD64),
0x42 => Some(Opcode::STORE8),
0x43 => Some(Opcode::STORE64),
0x44 => Some(Opcode::MSIZE),
0x45 => Some(Opcode::MCOPY),
0x50 => Some(Opcode::SLOAD),
0x51 => Some(Opcode::SSTORE),
0x70 => Some(Opcode::LOADI),
0x71 => Some(Opcode::MOV),
0x80 => Some(Opcode::CALLER),
0x81 => Some(Opcode::CALLVALUE),
0x82 => Some(Opcode::ADDRESS),
0x83 => Some(Opcode::BLOCKNUMBER),
0x84 => Some(Opcode::TIMESTAMP),
0x85 => Some(Opcode::GAS),
0xF0 => Some(Opcode::LOG),
_ => None,
}
}
/// Get the number of bytes this instruction consumes (including opcode).
pub fn instruction_size(&self) -> usize {
match self {
// No operands (1 byte total)
Opcode::HALT | Opcode::NOP | Opcode::RET => 1,
// One register (2 bytes: opcode + register)
Opcode::JUMP | Opcode::NOT | Opcode::LOG |
Opcode::MSIZE | Opcode::CALLER | Opcode::CALLVALUE |
Opcode::ADDRESS | Opcode::BLOCKNUMBER |
Opcode::TIMESTAMP | Opcode::GAS => 2,
// Two registers (2 bytes: opcode + packed registers)
Opcode::MOV | Opcode::LOAD8 | Opcode::STORE8 |
Opcode::LOAD64 | Opcode::STORE64 |
Opcode::SLOAD | Opcode::SSTORE | Opcode::ISZERO => 2,
// Three registers (3 bytes: opcode + 2 packed register bytes)
Opcode::ADD | Opcode::SUB | Opcode::MUL |
Opcode::DIV | Opcode::MOD | Opcode::AND |
Opcode::OR | Opcode::XOR | Opcode::SHL |
Opcode::SHR | Opcode::EQ | Opcode::NE |
Opcode::LT | Opcode::GT | Opcode::LE |
Opcode::GE | Opcode::MCOPY | Opcode::JUMPI => 3,
// Register + 64-bit immediate (10 bytes)
Opcode::LOADI => 10, // 1 (opcode) + 1 (reg) + 8 (immediate)
// Fallback
_ => 2,
}
}
}

Gas prevents infinite loops and makes computation economically accountable. Every instruction costs gas. If you run out, execution halts.

crates/vm/src/gas.rs
/// Gas costs for operations.
pub struct GasCosts;
impl GasCosts {
// Tier 1: Very cheap (simple register operations)
pub const ZERO: u64 = 0; // HALT, NOP
pub const BASE: u64 = 2; // ADD, SUB, AND, OR, MOV
// Tier 2: Cheap (more complex ALU operations)
pub const LOW: u64 = 3; // MUL, comparison ops
// Tier 3: Medium (division, shifts)
pub const MID: u64 = 5; // DIV, MOD, SHL, SHR
// Tier 4: Memory operations
pub const MEMORY_READ: u64 = 3;
pub const MEMORY_WRITE: u64 = 3;
pub const MEMORY_GROW_PER_BYTE: u64 = 1;
// Tier 5: Storage (expensive!)
pub const SLOAD: u64 = 100; // Read from storage
pub const SSTORE_SET: u64 = 20000; // Write to empty slot
pub const SSTORE_RESET: u64 = 5000; // Overwrite existing slot
// Control flow
pub const JUMP: u64 = 8;
pub const CALL: u64 = 700;
}
/// Gas meter tracks remaining gas.
pub struct GasMeter {
remaining: u64,
used: u64,
}
impl GasMeter {
pub fn new(limit: u64) -> Self {
Self { remaining: limit, used: 0 }
}
/// Consume gas, returning error if insufficient.
pub fn consume(&mut self, amount: u64) -> Result<(), VmError> {
if self.remaining < amount {
return Err(VmError::OutOfGas {
required: amount,
remaining: self.remaining,
});
}
self.remaining -= amount;
self.used += amount;
Ok(())
}
pub fn remaining(&self) -> u64 {
self.remaining
}
pub fn used(&self) -> u64 {
self.used
}
}
OperationCostRationale
ADD, SUB2Single CPU cycle equivalent
MUL3Slightly more complex
DIV5Division is slow on real CPUs
SLOAD100Disk read (SSD ~100μs)
SSTORE (new)20,000Permanent storage is expensive

Memory isn’t free. When you access a higher address than before, you pay for the expansion:

impl GasMeter {
/// Calculate gas for memory expansion.
pub fn memory_expansion_cost(current_size: usize, new_size: usize) -> u64 {
if new_size <= current_size {
return 0;
}
let expansion = (new_size - current_size) as u64;
expansion * GasCosts::MEMORY_GROW_PER_BYTE
}
}

The executor is the heart of the VM — the fetch-decode-execute loop:

crates/vm/src/executor.rs
use crate::{
gas::{GasCosts, GasMeter},
memory::Memory,
opcodes::Opcode,
};
use minichain_core::Address;
use thiserror::Error;
#[derive(Error, Debug, Clone, PartialEq, Eq)]
pub enum VmError {
#[error("Out of gas: required {required}, remaining {remaining}")]
OutOfGas { required: u64, remaining: u64 },
#[error("Invalid opcode: 0x{0:02X}")]
InvalidOpcode(u8),
#[error("Division by zero")]
DivisionByZero,
#[error("Memory overflow")]
MemoryOverflow,
#[error("Invalid jump destination: {0}")]
InvalidJump(usize),
#[error("Stack underflow")]
StackUnderflow,
#[error("Execution reverted")]
Reverted,
}
/// Execution result.
pub struct ExecutionResult {
pub success: bool,
pub gas_used: u64,
pub return_data: Vec<u8>,
pub logs: Vec<u64>, // LOG opcode outputs
}
/// The virtual machine state.
pub struct Vm {
registers: [u64; 16],
memory: Memory,
pc: usize,
gas: GasMeter,
bytecode: Vec<u8>,
halted: bool,
// Context
caller: Address,
address: Address,
call_value: u64,
block_number: u64,
timestamp: u64,
// Outputs
logs: Vec<u64>,
}
impl Vm {
/// Create a new VM with the given bytecode and gas limit.
pub fn new(
bytecode: Vec<u8>,
gas_limit: u64,
caller: Address,
address: Address,
call_value: u64,
) -> Self {
Self {
registers: [0; 16],
memory: Memory::new(1024 * 1024), // 1MB max
pc: 0,
gas: GasMeter::new(gas_limit),
bytecode,
halted: false,
caller,
address,
call_value,
block_number: 0,
timestamp: 0,
logs: Vec::new(),
}
}
/// Run the VM until it halts or runs out of gas.
pub fn run(&mut self) -> Result<ExecutionResult, VmError> {
while !self.halted && self.pc < self.bytecode.len() {
self.step()?;
}
Ok(ExecutionResult {
success: self.halted, // HALT = success, out of bounds = failure
gas_used: self.gas.used(),
return_data: Vec::new(), // TODO: implement return data
logs: std::mem::take(&mut self.logs),
})
}
/// Execute a single instruction.
fn step(&mut self) -> Result<(), VmError> {
// Fetch
let opcode_byte = self.bytecode[self.pc];
let opcode = Opcode::from_byte(opcode_byte)
.ok_or(VmError::InvalidOpcode(opcode_byte))?;
// Decode & Execute
match opcode {
Opcode::HALT => {
self.gas.consume(GasCosts::ZERO)?;
self.halted = true;
}
Opcode::NOP => {
self.gas.consume(GasCosts::ZERO)?;
self.pc += 1;
}
Opcode::ADD => {
self.gas.consume(GasCosts::BASE)?;
let (dst, s1, s2) = self.decode_rrr();
let result = self.registers[s1].wrapping_add(self.registers[s2]);
self.registers[dst] = result;
self.pc += 3;
}
Opcode::SUB => {
self.gas.consume(GasCosts::BASE)?;
let (dst, s1, s2) = self.decode_rrr();
let result = self.registers[s1].wrapping_sub(self.registers[s2]);
self.registers[dst] = result;
self.pc += 3;
}
Opcode::MUL => {
self.gas.consume(GasCosts::LOW)?;
let (dst, s1, s2) = self.decode_rrr();
let result = self.registers[s1].wrapping_mul(self.registers[s2]);
self.registers[dst] = result;
self.pc += 3;
}
Opcode::DIV => {
self.gas.consume(GasCosts::MID)?;
let (dst, s1, s2) = self.decode_rrr();
let divisor = self.registers[s2];
if divisor == 0 {
return Err(VmError::DivisionByZero);
}
self.registers[dst] = self.registers[s1] / divisor;
self.pc += 3;
}
Opcode::LOADI => {
self.gas.consume(GasCosts::BASE)?;
let dst = self.decode_r();
let immediate = self.decode_imm64();
self.registers[dst] = immediate;
self.pc += 10;
}
Opcode::MOV => {
self.gas.consume(GasCosts::BASE)?;
let (dst, src) = self.decode_rr();
self.registers[dst] = self.registers[src];
self.pc += 2;
}
Opcode::JUMP => {
self.gas.consume(GasCosts::JUMP)?;
let target = self.decode_r();
let addr = self.registers[target] as usize;
if addr >= self.bytecode.len() {
return Err(VmError::InvalidJump(addr));
}
self.pc = addr;
}
Opcode::JUMPI => {
self.gas.consume(GasCosts::JUMP)?;
let (cond, target) = self.decode_rr();
if self.registers[cond] != 0 {
let addr = self.registers[target] as usize;
if addr >= self.bytecode.len() {
return Err(VmError::InvalidJump(addr));
}
self.pc = addr;
} else {
self.pc += 2;
}
}
Opcode::LOG => {
self.gas.consume(GasCosts::BASE)?;
let src = self.decode_r();
self.logs.push(self.registers[src]);
self.pc += 2;
}
// ... implement remaining opcodes similarly
_ => {
return Err(VmError::InvalidOpcode(opcode_byte));
}
}
Ok(())
}
}

These helper functions extract register operands from the packed binary instruction format. Since we pack multiple 4-bit register numbers into single bytes to keep bytecode compact, we need bit-shifting and masking to extract individual register indices. Each helper corresponds to one of the instruction formats (R, RR, RRR) defined in section 3.2.

impl Vm {
/// Decode a single register operand: [opcode, RRRR____]
fn decode_r(&self) -> usize {
((self.bytecode[self.pc + 1] >> 4) & 0x0F) as usize
}
/// Decode two register operands: [opcode, RRRR_SSSS]
fn decode_rr(&self) -> (usize, usize) {
let byte = self.bytecode[self.pc + 1];
let r1 = ((byte >> 4) & 0x0F) as usize;
let r2 = (byte & 0x0F) as usize;
(r1, r2)
}
/// Decode three register operands: [opcode, DDDD_SSS1, SSS2____]
fn decode_rrr(&self) -> (usize, usize, usize) {
let b1 = self.bytecode[self.pc + 1];
let b2 = self.bytecode[self.pc + 2];
let dst = ((b1 >> 4) & 0x0F) as usize;
let s1 = (b1 & 0x0F) as usize;
let s2 = ((b2 >> 4) & 0x0F) as usize;
(dst, s1, s2)
}
/// Decode a 64-bit immediate value (little-endian).
fn decode_imm64(&self) -> u64 {
let start = self.pc + 2;
let bytes = &self.bytecode[start..start + 8];
u64::from_le_bytes(bytes.try_into().unwrap())
}
}

The VM connects to the storage layer from Chapter 2 via callbacks:

/// Storage interface for the VM.
pub trait StorageBackend {
/// Read 32 bytes from storage slot.
fn sload(&self, key: &[u8; 32]) -> [u8; 32];
/// Write 32 bytes to storage slot.
fn sstore(&mut self, key: &[u8; 32], value: &[u8; 32]);
}
impl Vm {
/// Execute SLOAD: read from persistent storage.
fn execute_sload<S: StorageBackend>(
&mut self,
storage: &S,
) -> Result<(), VmError> {
self.gas.consume(GasCosts::SLOAD)?;
let (dst, key_reg) = self.decode_rr();
// Convert register value to 32-byte key
let key_value = self.registers[key_reg];
let mut key = [0u8; 32];
key[24..32].copy_from_slice(&key_value.to_be_bytes());
// Load from storage
let value = storage.sload(&key);
// Store first 8 bytes in register (simplified)
self.registers[dst] = u64::from_be_bytes(value[24..32].try_into().unwrap());
self.pc += 2;
Ok(())
}
/// Execute SSTORE: write to persistent storage.
fn execute_sstore<S: StorageBackend>(
&mut self,
storage: &mut S,
) -> Result<(), VmError> {
// Check if slot was empty for gas calculation
let (key_reg, value_reg) = self.decode_rr();
let key_value = self.registers[key_reg];
let mut key = [0u8; 32];
key[24..32].copy_from_slice(&key_value.to_be_bytes());
let current = storage.sload(&key);
let is_empty = current == [0u8; 32];
// Different costs for new vs existing slot
let cost = if is_empty {
GasCosts::SSTORE_SET
} else {
GasCosts::SSTORE_RESET
};
self.gas.consume(cost)?;
// Write to storage
let mut value = [0u8; 32];
value[24..32].copy_from_slice(&self.registers[value_reg].to_be_bytes());
storage.sstore(&key, &value);
self.pc += 2;
Ok(())
}
}

For debugging, we can record every instruction executed:

crates/vm/src/tracer.rs
use crate::opcodes::Opcode;
/// A single trace entry.
#[derive(Debug, Clone)]
pub struct TraceStep {
pub pc: usize,
pub opcode: Opcode,
pub gas_before: u64,
pub gas_after: u64,
pub registers: [u64; 16],
}
/// Records execution for debugging.
pub struct Tracer {
steps: Vec<TraceStep>,
enabled: bool,
}
impl Tracer {
pub fn new(enabled: bool) -> Self {
Self {
steps: Vec::new(),
enabled,
}
}
pub fn record(&mut self, step: TraceStep) {
if self.enabled {
self.steps.push(step);
}
}
pub fn steps(&self) -> &[TraceStep] {
&self.steps
}
/// Print a human-readable trace.
pub fn print_trace(&self) {
for (i, step) in self.steps.iter().enumerate() {
println!(
"{:4}: PC={:04X} {:?} gas={} R0={} R1={} R2={}",
i,
step.pc,
step.opcode,
step.gas_after,
step.registers[0],
step.registers[1],
step.registers[2],
);
}
}
}
0: PC=0000 LOADI gas=999998 R0=10 R1=0 R2=0
1: PC=000A LOADI gas=999996 R0=10 R1=20 R2=0
2: PC=0014 ADD gas=999994 R0=10 R1=20 R2=30
3: PC=0017 LOG gas=999992 R0=10 R1=20 R2=30
4: PC=0019 HALT gas=999992 R0=10 R1=20 R2=30

Let’s trace through a simple program:

; Calculate 10 + 20 and log the result
LOADI R0, 10 ; R0 = 10
LOADI R1, 20 ; R1 = 20
ADD R2, R0, R1 ; R2 = R0 + R1 = 30
LOG R2 ; output 30
HALT ; stop

Bytecode encoding:

70 00 0A 00 00 00 00 00 00 00 ; LOADI R0, 10
70 10 14 00 00 00 00 00 00 00 ; LOADI R1, 20
10 20 10 ; ADD R2, R0, R1
F0 20 ; LOG R2
00 ; HALT

Execution:

StepPCInstructionR0R1R2Gas Used
00x00LOADI R0, 1010002
10x0ALOADI R1, 20102004
20x14ADD R2, R0, R11020306
30x17LOG R21020308
40x19HALT1020308

Result: Success, gas used = 8, logs = [30]


We’ve built a register-based virtual machine:

ComponentWhat It Does
Registers16 × 64-bit general-purpose registers
MemoryGrowable linear memory buffer
Opcode43 instructions (arithmetic, logic, memory, storage)
GasMeterTracks and limits computation
VmThe fetch-decode-execute loop
TracerRecords execution for debugging
DecisionRationale
Register-basedMore intuitive than stack-based
16 registersAddressable with 4 bits, enough for most code
64-bit valuesMatches modern CPUs, simpler than 256-bit
Gas meteringPrevents infinite loops, enables economic accountability
32-byte storage slotsCompatible with Ethereum, fits hashes

We have a VM, but writing bytecode by hand is tedious. In Chapter 4, we’ll build an assembler that compiles human-readable assembly into bytecode:

; This is much nicer than raw bytes!
.entry main
main:
LOADI R0, 0 ; storage slot 0
SLOAD R1, R0 ; load current counter
ADDI R1, R1, 1 ; increment
SSTORE R0, R1 ; store back
HALT

The assembler will handle labels, .entry directives, and emit the compact bytecode our VM expects.