Chapter 4: Assembly Language & Assembler
In Chapter 3, we built a register-based virtual machine that executes bytecode. But writing raw bytecode is tedious and error-prone. Imagine debugging this:
70 00 0A 00 00 00 00 00 00 00 70 10 14 00 00 00 00 00 00 00 10 20 10 F0 20 00What does it do? Who knows! You’d need to manually decode each opcode, track register values, and trace through jumps. This is like writing machine code in the 1950s — technically possible, but painfully inefficient.
Assembly language is the solution. It provides human-readable mnemonics for opcodes, symbolic labels for jump targets, and comments for documentation:
; Calculate 10 + 20 and log the result.entry main
main: LOADI R0, 10 ; R0 = 10 LOADI R1, 20 ; R1 = 20 ADD R2, R0, R1 ; R2 = R0 + R1 = 30 LOG R2 ; output 30 HALT ; stopNow it’s immediately clear what the code does. The assembler is the tool that translates this human-readable syntax into the compact bytecode our VM expects.
What We’re Building
Section titled “What We’re Building”By the end of this chapter, you’ll understand how to build an assembler with:
| Module | Purpose |
|---|---|
lexer.rs | Tokenize assembly source into meaningful units (keywords, registers, numbers) |
parser.rs | Build Abstract Syntax Tree (AST) from tokens |
compiler.rs | Resolve labels and emit bytecode with correct jump addresses |
| Error Handling | Provide helpful diagnostics with line numbers and context |
Why an Assembler?
Section titled “Why an Assembler?”The Problem: Manual Bytecode Encoding
Section titled “The Problem: Manual Bytecode Encoding”Let’s say you want to write a contract that increments a storage counter. In raw bytecode, you’d need to:
- Encode opcodes — Look up that LOADI is
0x70, SLOAD is0x50, ADD is0x10, etc. - Pack registers — Combine register numbers into nibbles:
R0andR1→0x01 - Handle immediates — Encode the 64-bit value
1as little-endian bytes - Calculate jump addresses — If you add a line of code, all subsequent jumps need manual adjustment
- No comments — The bytecode has no space for documentation
Result: Unmaintainable, unreadable, and error-prone.
The Solution: Assembly Language
Section titled “The Solution: Assembly Language”Assembly provides:
- Mnemonics —
ADD R2, R0, R1instead of0x10 0x20 0x10 - Symbolic labels —
JUMP loop_startinstead ofJUMP 0x0014 - Comments —
;for inline documentation - Directives —
.entry mainto specify the entry point - Error messages — “undefined label ‘loop_end’ at line 15” instead of silent corruption
Assembler as a Translator
Section titled “Assembler as a Translator”Think of the assembler as a translator or document processor:
| Analogy | Assembly → Bytecode |
|---|---|
| Translator | Converts English text to French |
| Compiler | Converts C to machine code |
| Document Processor | Converts Markdown to HTML |
The assembler doesn’t change what the program does — it just makes it easier for humans to write and understand.
4.1 Assembly Language Syntax
Section titled “4.1 Assembly Language Syntax”Before we implement the assembler, let’s define the language it compiles.
Instruction Format
Section titled “Instruction Format”Our assembly language has three main instruction formats:
| Format | Example | Description |
|---|---|---|
| RRR (3 registers) | ADD R2, R0, R1 | Destination + two sources |
| RR (2 registers) | MOV R0, R1 | Destination + one source |
| RI (register + immediate) | LOADI R0, 42 | Load a constant value |
| R (1 register) | JUMP R0 | Single operand |
| No operands | HALT | Control flow |
All instructions are case-insensitive (ADD = add), but we conventionally use uppercase for clarity.
Registers
Section titled “Registers”We have 16 general-purpose registers: R0 through R15
| Registers | Common Usage |
|---|---|
| R0-R11 | General purpose — use freely for computation |
| R12 | Frame pointer (for function local variables) |
| R13 | Stack pointer (for call stack management) |
| R14 | Link register (stores return address for CALL/RET) |
| R15 | Program counter (read-only via special instructions) |
Labels and Jumps
Section titled “Labels and Jumps”Labels mark locations in code for jumps and calls:
loop_start: ; Define a label ADDI R0, R0, 1 ; Increment R0 LT R1, R0, R2 ; Compare R0 < R2 JUMPI R1, loop_start ; If true, jump back- Label definition:
name:— marks the current address - Label reference: Use the label name as an operand
Labels can be referenced before they’re defined (forward references). The assembler handles this with a two-pass algorithm (explained in section 4.5).
Directives
Section titled “Directives”Directives control the assembler itself (not the VM):
| Directive | Purpose | Example |
|---|---|---|
.entry | Specify the entry point label | .entry main |
.const | Define a named constant (future feature) | .const MAX_SUPPLY 1000000 |
Comments
Section titled “Comments”Use semicolons (;) for comments. Everything after ; on a line is ignored:
LOADI R0, 10 ; This is a comment; This entire line is a commentADD R1, R0, R0 ; R1 = R0 + R0 = 20Complete Example
Section titled “Complete Example”Here’s a simple counter contract with all syntax elements:
; Increment a storage counter.entry main
main: LOADI R0, 0 ; storage slot 0 = counter SLOAD R1, R0 ; load current value (100 gas) LOADI R2, 1 ; constant 1 ADD R1, R1, R2 ; increment (2 gas) SSTORE R0, R1 ; save back (5000-20000 gas) HALT ; stop executionWhat this does:
- Loads the value from storage slot 0
- Adds 1 to it
- Stores the result back
- Halts successfully
4.2 Instruction Reference
Section titled “4.2 Instruction Reference”The assembler compiles assembly mnemonics into the VM’s 43-instruction set. This section provides a quick overview of the instructions available for assembly programming.
Instruction Categories
Section titled “Instruction Categories”The VM supports 43 instructions across 9 categories:
| Category | Instructions | Purpose | Example |
|---|---|---|---|
| Control Flow | HALT, JUMP, JUMPI, CALL, RET, REVERT | Program control | JUMPI R5, loop_start |
| Arithmetic | ADD, SUB, MUL, DIV, MOD, ADDI | Math operations | ADD R0, R1, R2 |
| Bitwise | AND, OR, XOR, NOT, SHL, SHR | Bit manipulation | AND R3, R4, R5 |
| Comparison | EQ, NE, LT, GT, LE, GE, ISZERO | Value comparison | LT R6, R0, R1 |
| Memory | LOAD8, LOAD64, STORE8, STORE64, MSIZE, MCOPY | RAM (temporary) | LOAD64 R0, R1 |
| Storage | SLOAD, SSTORE | Disk (persistent) | SSTORE R0, R1 |
| Immediate | LOADI, MOV | Constants & copy | LOADI R0, 12345 |
| Context | CALLER, ADDRESS, TIMESTAMP, etc. | Execution info | CALLER R0 |
| Debug | LOG | Debugging output | LOG R0 |
Common Assembly Patterns
Section titled “Common Assembly Patterns”Counter (most common smart contract):
.entry main
main: LOADI R0, 0 ; Storage key 0 SLOAD R1, R0 ; Load counter LOADI R2, 1 ; Constant 1 ADD R1, R1, R2 ; Increment SSTORE R0, R1 ; Save back HALTAccess control:
check_owner: CALLER R0 ; Get caller address LOADI R1, 0xOWNER ; Load expected owner EQ R2, R0, R1 ; Compare JUMPI R2, authorized ; If equal, continue REVERT ; Otherwise, abortauthorized: ; ... privileged code ...If-then-else:
LT R5, R0, R1 ; R5 = (R0 < R1) JUMPI R5, less_than ; if true, jump ; ... R0 >= R1 case ... JUMP doneless_than: ; ... R0 < R1 case ...done:While loop:
loop: LT R5, R0, R10 ; R5 = (R0 < R10) ISZERO R6, R5 ; R6 = !R5 JUMPI R6, end ; if !(R0 < R10), break ; ... loop body ... JUMP loopend:Key Design Choices
Section titled “Key Design Choices”Register-based vs Stack-based:
- Minichain uses registers (like x86, ARM)
- Ethereum uses a stack (like JVM, Forth)
Advantage: Registers produce more compact bytecode. Compare:
; Register-based (Minichain): 3 instructionsADD R0, R1, R2 ; R0 = R1 + R2SUB R3, R0, R4 ; R3 = R0 - R4MUL R5, R3, R6 ; R5 = R3 * R6
; Stack-based (EVM): 8 instructionsPUSH R1 ; Stack: [R1]PUSH R2 ; Stack: [R1, R2]ADD ; Stack: [R1+R2]PUSH R4 ; Stack: [R1+R2, R4]SUB ; Stack: [(R1+R2)-R4]PUSH R6 ; Stack: [result, R6]MUL ; Stack: [result*R6]POP R5 ; Save to R5Memory vs Storage:
- Memory (RAM): Temporary, cheap (~3 gas), cleared after execution
- Storage (Disk): Persistent, expensive (~5,000-20,000 gas), survives across transactions
Use Memory for computation, Storage for state that must persist.
Assembly Syntax Basics
Section titled “Assembly Syntax Basics”; Comments start with semicolon
.entry main ; Entry point directive
main: ; Label definition LOADI R0, 100 ; Load immediate: R0 = 100 LOADI R1, 200 ; R1 = 200 ADD R2, R0, R1 ; R2 = R0 + R1 = 300 LOG R2 ; Output R2 for debugging HALT ; Stop execution
; Registers: R0-R15 (16 general-purpose 64-bit registers); Special: R14 used by CALL/RET for return address4.3 Lexer: Breaking Text into Tokens
Section titled “4.3 Lexer: Breaking Text into Tokens”The first step in compilation is tokenization — breaking the source code into meaningful units called tokens.
What is Tokenization?
Section titled “What is Tokenization?”Think of tokenization like breaking a sentence into words. Consider this English sentence:
The quick brown fox jumps.A human naturally recognizes five words, punctuation, and spaces. A computer needs explicit rules to identify these boundaries.
Similarly, this assembly line:
ADD R2, R0, R1Must be broken into tokens:
ADD— instruction keywordR2— register,— comma separatorR0— register,— comma separatorR1— register
Token Types
Section titled “Token Types”Here are the key token types we need to recognize:
#[derive(Logos, Debug, Clone, PartialEq)]pub enum Token { // ========== Instructions (sample - not exhaustive) ========== // Control flow #[token("HALT", ignore(case))] Halt, #[token("JUMP", ignore(case))] Jump, #[token("JUMPI", ignore(case))] JumpI, #[token("CALL", ignore(case))] Call, #[token("RET", ignore(case))] Ret, #[token("REVERT", ignore(case))] Revert,
// Arithmetic #[token("ADD", ignore(case))] Add, #[token("SUB", ignore(case))] Sub, #[token("MUL", ignore(case))] Mul, #[token("DIV", ignore(case))] Div, #[token("MOD", ignore(case))] Mod, #[token("ADDI", ignore(case))] AddI,
// Bitwise #[token("AND", ignore(case))] And, #[token("OR", ignore(case))] Or, #[token("XOR", ignore(case))] Xor, #[token("NOT", ignore(case))] Not, #[token("SHL", ignore(case))] Shl, #[token("SHR", ignore(case))] Shr,
// Comparison #[token("EQ", ignore(case))] Eq, #[token("NE", ignore(case))] Ne, #[token("LT", ignore(case))] Lt, #[token("GT", ignore(case))] Gt, #[token("LE", ignore(case))] Le, #[token("GE", ignore(case))] Ge, #[token("ISZERO", ignore(case))] IsZero,
// Memory operations #[token("LOAD8", ignore(case))] Load8, #[token("LOAD64", ignore(case))] Load64, #[token("STORE8", ignore(case))] Store8, #[token("STORE64", ignore(case))] Store64, #[token("MSIZE", ignore(case))] MSize, #[token("MCOPY", ignore(case))] MCopy,
// Storage operations #[token("SLOAD", ignore(case))] SLoad, #[token("SSTORE", ignore(case))] SStore,
// Immediate/Move #[token("LOADI", ignore(case))] LoadI, #[token("MOV", ignore(case))] Mov,
// Context #[token("CALLER", ignore(case))] Caller, #[token("CALLVALUE", ignore(case))] CallValue, #[token("ADDRESS", ignore(case))] Address, #[token("BLOCKNUMBER", ignore(case))] BlockNumber, #[token("TIMESTAMP", ignore(case))] Timestamp, #[token("GAS", ignore(case))] Gas,
// Debug #[token("LOG", ignore(case))] Log,
// ========== Registers ========== #[regex(r"[Rr]([0-9]|1[0-5])", parse_register)] Register(u8),
// ========== Numbers ========== #[regex(r"[0-9]+", parse_number)] Number(u64),
#[regex(r"0x[0-9a-fA-F]+", parse_hex_number)] HexNumber(u64),
// ========== Identifiers (labels) ========== #[regex(r"[a-zA-Z_][a-zA-Z0-9_]*", |lex| lex.slice().to_string())] Identifier(String),
// ========== Directives ========== #[regex(r"\.[a-z]+", |lex| lex.slice()[1..].to_string())] Directive(String), // .entry, .const, etc.
// ========== Symbols ========== #[token(",")] Comma, #[token(":")] Colon,
// ========== Whitespace and comments (skipped) ========== #[regex(r";[^\n]*", logos::skip)] // Comments #[regex(r"[ \t\n\r]+", logos::skip)] // Whitespace
// ========== Errors ========== #[error] Error,}
// Helper functions for parsingfn parse_register(lex: &mut logos::Lexer<Token>) -> Option<u8> { let slice = lex.slice(); let num_str = &slice[1..]; // Skip 'R' or 'r' num_str.parse().ok()}
fn parse_number(lex: &mut logos::Lexer<Token>) -> Option<u64> { lex.slice().parse().ok()}
fn parse_hex_number(lex: &mut logos::Lexer<Token>) -> Option<u64> { let slice = &lex.slice()[2..]; // Skip "0x" u64::from_str_radix(slice, 16).ok()}Using the Logos Crate
Section titled “Using the Logos Crate”Why logos?
Logos is a lexer generator that compiles tokenization rules into a deterministic finite automaton (DFA) at compile time. This means:
- Performance — Tokenization is as fast as hand-written code (sometimes faster)
- Simplicity — Define tokens with attributes, not complex regex libraries
- Type safety — Tokens are Rust enums, not strings
- Error handling — Invalid input produces an
Errortoken
Logos uses derive macros, so adding a new token is trivial:
#[token("NOP", ignore(case))]Nop,Example Tokenization
Section titled “Example Tokenization”Input:
LOADI R0, 10 ; load valueTokens:
[ Token::LoadI, Token::Register(0), Token::Comma, Token::Number(10), // Comment is automatically skipped]Error Handling
Section titled “Error Handling”If the lexer encounters an invalid character or sequence, it produces an Error token:
Input:
ADD R0, @invalid, R1Tokens:
[ Token::Add, Token::Register(0), Token::Comma, Token::Error, // <-- Invalid '@' symbol // ... rest of line may be corrupted]The parser can then report: “Unexpected character ’@’ at line 1, column 9”
Line Number Tracking
Section titled “Line Number Tracking”Logos provides Span information — the byte range of each token in the source. By counting newlines in skipped whitespace, we can maintain line numbers for error messages:
pub struct Lexer<'source> { inner: logos::Lexer<'source, Token>, line: usize,}
impl<'source> Lexer<'source> { pub fn new(source: &'source str) -> Self { Self { inner: Token::lexer(source), line: 1, } }
pub fn next_token(&mut self) -> Option<(Token, usize)> { self.inner.next().map(|result| { let token = result.unwrap_or(Token::Error); // Update line count if token contained newlines let line = self.line; (token, line) }) }}4.4 Parser: Building the Syntax Tree
Section titled “4.4 Parser: Building the Syntax Tree”The parser takes the flat stream of tokens and builds a hierarchical Abstract Syntax Tree (AST) that represents the program’s structure.
What is an AST?
Section titled “What is an AST?”An AST is a tree representation of the syntactic structure of source code. Think of it like:
- Document outline — Sections, subsections, paragraphs
- Recipe structure — Ingredients list, steps (some steps have sub-steps)
- Organization chart — CEO → VPs → Managers → Employees
For an assembler, the tree is simpler:
Program├── Directive(.entry main)├── Label(main)├── Instruction(LoadI { dst: R0, value: 10 })├── Instruction(Add { dst: R2, s1: R0, s2: R1 })└── Instruction(Halt)AST Node Types
Section titled “AST Node Types”/// Top-level program structurepub struct Program { pub statements: Vec<Statement>, pub entry_point: Option<String>, // From .entry directive}
/// Statement typespub enum Statement { Label(String), // loop_start: Instruction(Instruction), // ADD R0, R1, R2 Directive(Directive), // .entry main}
/// Directive typespub enum Directive { Entry(String), // .entry main Const(String, u64), // .const MAX 1000 (future)}
/// Instruction categories (balanced sample)pub enum Instruction { // ========== Control Flow ========== Halt, Nop, Jump { target: u8 }, JumpI { cond: u8, target: u8 }, Call { target: u8 }, Ret, Revert,
// ========== Arithmetic ========== Add { dst: u8, s1: u8, s2: u8 }, Sub { dst: u8, s1: u8, s2: u8 }, Mul { dst: u8, s1: u8, s2: u8 }, Div { dst: u8, s1: u8, s2: u8 }, Mod { dst: u8, s1: u8, s2: u8 }, AddI { dst: u8, src: u8, imm: u64 },
// ========== Bitwise ========== And { dst: u8, s1: u8, s2: u8 }, Or { dst: u8, s1: u8, s2: u8 }, Xor { dst: u8, s1: u8, s2: u8 }, Not { dst: u8, src: u8 }, Shl { dst: u8, s1: u8, s2: u8 }, Shr { dst: u8, s1: u8, s2: u8 },
// ========== Comparison ========== Eq { dst: u8, s1: u8, s2: u8 }, Ne { dst: u8, s1: u8, s2: u8 }, Lt { dst: u8, s1: u8, s2: u8 }, Gt { dst: u8, s1: u8, s2: u8 }, Le { dst: u8, s1: u8, s2: u8 }, Ge { dst: u8, s1: u8, s2: u8 }, IsZero { dst: u8, src: u8 },
// ========== Memory ========== Load8 { dst: u8, addr: u8 }, Load64 { dst: u8, addr: u8 }, Store8 { addr: u8, src: u8 }, Store64 { addr: u8, src: u8 }, MSize { dst: u8 }, MCopy { dst: u8, src: u8, len: u8 },
// ========== Storage ========== SLoad { dst: u8, key: u8 }, SStore { key: u8, value: u8 },
// ========== Immediate ========== LoadI { dst: u8, value: u64 }, Mov { dst: u8, src: u8 },
// ========== Context ========== Caller { dst: u8 }, CallValue { dst: u8 }, Address { dst: u8 }, BlockNumber { dst: u8 }, Timestamp { dst: u8 }, Gas { dst: u8 },
// ========== Debug ========== Log { src: u8 },}Parsing Strategy
Section titled “Parsing Strategy”We use a hand-written recursive descent parser. This approach:
- Reads tokens left-to-right
- Calls functions recursively based on grammar rules
- Provides excellent error messages
- Easy to extend with new syntax
High-level parsing flow:
pub struct Parser { tokens: Vec<(Token, usize)>, // (token, line_number) position: usize,}
impl Parser { pub fn parse(source: &str) -> Result<Program, ParseError> { let lexer = Lexer::new(source); let tokens = lexer.collect(); let mut parser = Parser { tokens, position: 0 }; parser.parse_program() }
fn parse_program(&mut self) -> Result<Program, ParseError> { let mut statements = Vec::new(); let mut entry_point = None;
while !self.is_at_end() { let stmt = self.parse_statement()?;
// Track .entry directive if let Statement::Directive(Directive::Entry(name)) = &stmt { entry_point = Some(name.clone()); }
statements.push(stmt); }
Ok(Program { statements, entry_point }) }
fn parse_statement(&mut self) -> Result<Statement, ParseError> { match self.peek() { Token::Directive(name) => self.parse_directive(), Token::Identifier(_) if self.peek_next() == Some(&Token::Colon) => { self.parse_label() } _ => self.parse_instruction(), } }
fn parse_instruction(&mut self) -> Result<Statement, ParseError> { let (token, line) = self.advance();
match token { Token::Add => { // Expect: ADD Rdst, Rs1, Rs2 let dst = self.expect_register()?; self.expect_comma()?; let s1 = self.expect_register()?; self.expect_comma()?; let s2 = self.expect_register()?; Ok(Statement::Instruction(Instruction::Add { dst, s1, s2 })) }
Token::LoadI => { // Expect: LOADI Rdst, immediate let dst = self.expect_register()?; self.expect_comma()?; let value = self.expect_number()?; Ok(Statement::Instruction(Instruction::LoadI { dst, value })) }
Token::Halt => { Ok(Statement::Instruction(Instruction::Halt)) }
// ... handle all other instruction types
_ => Err(ParseError::UnexpectedToken { expected: "instruction", found: token, line, }), } }}Error Recovery
Section titled “Error Recovery”When the parser encounters an error, it should:
- Report the error with line number
- Optionally try to recover and continue parsing
- Collect multiple errors in one pass (don’t stop at the first error)
Example error message:
error: expected register, found '42' --> contract.asm:15:10 |15 | ADD 42, R0, R1 | ^^ expected register (R0-R15), found numberExample Parse Tree
Section titled “Example Parse Tree”Assembly:
.entry main
main: LOADI R0, 10 ADD R2, R0, R1AST:
Program { entry_point: Some("main"), statements: [ Statement::Directive(Directive::Entry("main")), Statement::Label("main"), Statement::Instruction(Instruction::LoadI { dst: 0, value: 10 }), Statement::Instruction(Instruction::Add { dst: 2, s1: 0, s2: 1 }), ],}4.5 Label Resolution: Two-Pass Compilation
Section titled “4.5 Label Resolution: Two-Pass Compilation”Labels are symbolic names for addresses. The challenge: labels can be forward referenced — used before they’re defined.
The Forward Reference Problem
Section titled “The Forward Reference Problem”Consider this code:
LOADI R5, loop_start ; Line 1: Reference loop_start (what address?) JUMP R5 ; Line 2: Jump to it ; ... more code ...loop_start: ; Line 10: Define loop_start ADD R0, R0, R1At line 1, we don’t know loop_start’s address yet — it’s defined later at line 10. We can’t emit the bytecode for line 1 without knowing the value to load into R5.
Solution: Two-pass compilation
Pass 1: Build Symbol Table
Section titled “Pass 1: Build Symbol Table”In the first pass, we scan the entire program and record the address of every label:
pub struct Compiler { symbol_table: HashMap<String, u64>, // label → address current_address: u64,}
impl Compiler { fn first_pass(&mut self, program: &Program) -> Result<(), CompileError> { self.current_address = 0;
for statement in &program.statements { match statement { Statement::Label(name) => { // Record this label's address if self.symbol_table.contains_key(name) { return Err(CompileError::DuplicateLabel(name.clone())); } self.symbol_table.insert(name.clone(), self.current_address); }
Statement::Instruction(inst) => { // Advance address by instruction size self.current_address += inst.byte_size(); }
Statement::Directive(_) => { // Directives don't emit bytecode } } }
Ok(()) }}After Pass 1, the symbol table might look like:
| Label | Address |
|---|---|
main | 0x0000 |
loop_start | 0x0014 |
loop_end | 0x002A |
error_handler | 0x0040 |
Pass 2: Emit Bytecode
Section titled “Pass 2: Emit Bytecode”In the second pass, we emit actual bytecode, using the symbol table to resolve label references:
impl Compiler { fn second_pass(&mut self, program: &Program) -> Result<Vec<u8>, CompileError> { let mut bytecode = Vec::new();
for statement in &program.statements { match statement { Statement::Label(_) => { // Labels don't emit bytecode, just mark positions }
Statement::Instruction(inst) => { self.emit_instruction(inst, &mut bytecode)?; }
Statement::Directive(_) => { // Already processed } } }
Ok(bytecode) }
fn emit_instruction(&self, inst: &Instruction, bytecode: &mut Vec<u8>) -> Result<(), CompileError> { match inst { Instruction::LoadI { dst, value } => { // Special case: if value is a label reference, resolve it let resolved_value = if let Some(&addr) = self.symbol_table.get(&value.to_string()) { addr } else { *value };
bytecode.push(0x70); // LOADI opcode bytecode.push(*dst << 4); // Register bytecode.extend_from_slice(&resolved_value.to_le_bytes()); // Immediate }
Instruction::Add { dst, s1, s2 } => { bytecode.push(0x10); // ADD opcode bytecode.push((*dst << 4) | *s1); // Pack dst and s1 bytecode.push(*s2 << 4); // Pack s2 }
// ... handle all other instructions
_ => todo!(), }
Ok(()) }}Complete Example Trace
Section titled “Complete Example Trace”Assembly:
.entry main
main: LOADI R5, loop_start JUMP R5
loop_start: ADDI R0, R0, 1 HALTPass 1 (Symbol Collection):
Address 0x0000: Label "main"Address 0x0000: LOADI (10 bytes)Address 0x000A: JUMP (2 bytes)Address 0x000C: Label "loop_start"Address 0x000C: ADDI (10 bytes)Address 0x0016: HALT (1 byte)
Symbol Table: main → 0x0000 loop_start → 0x000CPass 2 (Bytecode Emission):
0x0000: 70 50 0C 00 00 00 00 00 00 00 ; LOADI R5, 0x000C (loop_start)0x000A: 02 50 ; JUMP R50x000C: 15 00 01 00 00 00 00 00 00 00 ; ADDI R0, R0, 10x0016: 00 ; HALT4.6 Code Generation: Emitting Bytecode
Section titled “4.6 Code Generation: Emitting Bytecode”The final step is encoding AST instructions into the compact binary format our VM expects.
Instruction Encoding Formats
Section titled “Instruction Encoding Formats”Recall from Chapter 3, we have three main formats:
| Format | Bytes | Example | Encoding |
|---|---|---|---|
| R | 1-2 | HALT, NOT R0 | [opcode] [reg] |
| RRR | 3 | ADD R2, R0, R1 | [opcode] [dst|s1] [s2|pad] |
| RI | 10 | LOADI R0, 42 | [opcode] [reg] [imm64 le] |
Bit-Packing Registers
Section titled “Bit-Packing Registers”Registers are 4-bit values (0-15). We pack multiple registers into single bytes:
fn pack_rrr(dst: u8, s1: u8, s2: u8) -> [u8; 2] { [ (dst << 4) | (s1 & 0x0F), // First byte: dst (high 4 bits) | s1 (low 4 bits) (s2 << 4), // Second byte: s2 (high 4 bits) | padding ]}Example: ADD R2, R0, R1
dst=2 (0010), s1=0 (0000), s2=1 (0001)
Byte 1: (2 << 4) | 0 = 0x20 = 0b0010_0000Byte 2: (1 << 4) = 0x10 = 0b0001_0000
Bytecode: [0x10, 0x20, 0x10] ^^^^^^ ^^^^^^ ^^^^^^ opcode byte1 byte2Immediate Value Encoding
Section titled “Immediate Value Encoding”64-bit immediate values are encoded in little-endian format:
fn encode_immediate(value: u64) -> [u8; 8] { value.to_le_bytes()}Example: LOADI R0, 12345
12345 = 0x0000000000003039 (hex)
Little-endian bytes: [0x39, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
Full instruction:[0x70, 0x00, 0x39, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00] ^^^^^ ^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ opcode reg 64-bit immediate (little-endian)Complete Encoding Example
Section titled “Complete Encoding Example”Assembly:
main: LOADI R0, 10 ; Load 10 into R0 LOADI R1, 20 ; Load 20 into R1 ADD R2, R0, R1 ; R2 = R0 + R1 LOG R2 ; Log R2 HALT ; StopBytecode (with annotations):
Address | Bytes | Instruction--------|---------------------------------|-------------0x0000 | 70 00 0A 00 00 00 00 00 00 00 | LOADI R0, 100x000A | 70 10 14 00 00 00 00 00 00 00 | LOADI R1, 200x0014 | 10 20 10 | ADD R2, R0, R10x0017 | F0 20 | LOG R20x0019 | 00 | HALTDetailed breakdown:
LOADI R0, 10: 0x70 = LOADI opcode 0x00 = R0 (register 0 in high nibble: 0x00) 0x0A 0x00 0x00 0x00 0x00 0x00 0x00 0x00 = 10 as little-endian u64
ADD R2, R0, R1: 0x10 = ADD opcode 0x20 = dst=R2 (0x2_) and s1=R0 (_0x0) → 0x20 0x10 = s2=R1 (0x1_) and padding (_0x0) → 0x10
LOG R2: 0xF0 = LOG opcode 0x20 = R2 (register 2 in high nibble: 0x20)
HALT: 0x00 = HALT opcodeMemory Layout
Section titled “Memory Layout”The bytecode is loaded into VM memory starting at address 0:
VM Memory:┌──────────────────────────────────────┐│ Address 0x0000: 0x70 (LOADI opcode) ││ Address 0x0001: 0x00 (R0) ││ Address 0x0002-0x0009: 10 (le u64) ││ Address 0x000A: 0x70 (LOADI opcode) ││ Address 0x000B: 0x10 (R1) ││ ... ││ Address 0x0019: 0x00 (HALT) │└──────────────────────────────────────┘
Program Counter (PC) starts at 0x0000The VM’s fetch-decode-execute loop reads from this bytecode, advancing the PC after each instruction.
4.7 Error Messages and Debugging
Section titled “4.7 Error Messages and Debugging”Good error messages are crucial for developer experience. Let’s look at common errors and how to report them.
Error Categories
Section titled “Error Categories”| Category | Example | Fix |
|---|---|---|
| Lexical | ADD R0, @invalid, R1 | Remove invalid character ’@‘ |
| Syntax | ADD R0 R1 R2 | Missing commas between operands |
| Semantic | JUMP loop_end (undefined label) | Define the label or fix the typo |
| Validation | ADD R99, R0, R1 | Register R99 is out of range (max R15) |
Example Error Messages
Section titled “Example Error Messages”Undefined Label:
error: undefined label 'loop_end' --> contract.asm:15:10 |15 | JUMP loop_end | ^^^^^^^^ label not found |help: did you mean 'loop_start'?Invalid Register:
error: invalid register 'R99' --> contract.asm:8:10 | 8 | ADD R99, R0, R1 | ^^^ register must be R0-R15Missing Operand:
error: expected register, found end of line --> contract.asm:12:8 |12 | ADD R0, | ^ expected register after commaDuplicate Label:
error: label 'main' defined multiple times --> contract.asm:20:1 | 5 | main: | ---- first defined here |20 | main: | ^^^^ redefined hereError Reporting Structure
Section titled “Error Reporting Structure”#[derive(Debug)]pub enum AssemblerError { LexError { line: usize, column: usize, message: String, }, ParseError { line: usize, expected: String, found: String, }, UndefinedLabel { label: String, line: usize, }, DuplicateLabel { label: String, first_line: usize, second_line: usize, }, InvalidRegister { register: String, line: usize, },}
impl AssemblerError { pub fn format(&self, source: &str) -> String { match self { AssemblerError::UndefinedLabel { label, line } => { format!( "error: undefined label '{}'\n --> contract.asm:{}", label, line ) } // ... format other errors } }}4.8 Writing Assembly Programs
Section titled “4.8 Writing Assembly Programs”Now let’s see realistic assembly patterns you’ll actually use.
Example 1: Simple Counter (Expanded)
Section titled “Example 1: Simple Counter (Expanded)”This is the example from the chapter introduction, with detailed annotations:
; Increment a storage counter; Storage slot 0 holds the counter value.entry main
main: LOADI R0, 0 ; R0 = storage slot 0 (counter location) SLOAD R1, R0 ; R1 = Storage[0] (load current value) - 100 gas LOADI R2, 1 ; R2 = 1 (increment amount) ADD R1, R1, R2 ; R1 = R1 + 1 (increment) - 2 gas SSTORE R0, R1 ; Storage[0] = R1 (save back) - 5000 gas HALT ; Stop execution successfully - 0 gas
; Total gas: ~5102 gasGas breakdown:
- SLOAD: 100 gas (reading from disk is expensive)
- ADD: 2 gas (arithmetic is cheap)
- SSTORE: 5000 gas (writing to disk is very expensive)
Example 2: Conditional Logic (If-Then-Else)
Section titled “Example 2: Conditional Logic (If-Then-Else)”Using JUMPI for branching:
; Check if balance >= amount, then transfer or revert.entry main
main: ; Load values LOADI R10, 0 ; Storage slot 0 = balance SLOAD R0, R10 ; R0 = current balance LOADI R1, 100 ; R1 = amount to check
; Compare: balance >= amount GE R2, R0, R1 ; R2 = (R0 >= R1) ? 1 : 0 LOADI R3, transfer ; R3 = address of 'transfer' label JUMPI R2, R3 ; If R2 != 0, jump to transfer
; Else branch: insufficient funds REVERT ; Abort execution
transfer: ; Transfer logic would go here ; (subtract from balance, add to recipient, etc.) HALT ; SuccessExample 3: Counting Loop
Section titled “Example 3: Counting Loop”Using labels for loop control:
; Sum numbers 1 to 10.entry main
main: LOADI R0, 0 ; R0 = counter = 0 LOADI R1, 0 ; R1 = sum = 0 LOADI R2, 10 ; R2 = limit = 10
loop_start: ADDI R0, R0, 1 ; counter++ ADD R1, R1, R0 ; sum += counter LT R3, R0, R2 ; R3 = (counter < limit) LOADI R4, loop_start ; R4 = address of loop_start JUMPI R3, R4 ; If counter < limit, continue loop
; Loop finished, R1 contains sum (55) LOG R1 ; Output result HALTExecution trace (first 3 iterations):
| Iteration | R0 (counter) | R1 (sum) | R3 (counter < 10) |
|---|---|---|---|
| 0 | 0 | 0 | — |
| 1 | 1 | 1 | 1 (continue) |
| 2 | 2 | 3 | 1 (continue) |
| 3 | 3 | 6 | 1 (continue) |
| … | … | … | … |
| 10 | 10 | 55 | 0 (exit loop) |
Example 4: Function Calls with Stack
Section titled “Example 4: Function Calls with Stack”Using CALL and RET with stack-based parameter passing:
; Call a function: result = add_ten(5).entry main
main: ; Set up call LOADI R0, 5 ; R0 = argument (5) LOADI R1, 1000 ; R1 = stack address STORE64 R1, R0 ; Push argument to stack ADDI R1, R1, 8 ; Stack pointer += 8 (advance stack)
; Call function LOADI R2, add_ten ; R2 = function address CALL R2 ; Call function (stores return address in R14)
; Result is in R0 after return LOG R0 ; Log result (should be 15) HALT
add_ten: ; Function prologue: load argument from stack LOADI R1, 1000 ; R1 = stack address LOAD64 R0, R1 ; R0 = argument from stack
; Function body: add 10 LOADI R2, 10 ; R2 = 10 ADD R0, R0, R2 ; R0 = argument + 10 (result)
; Function epilogue: return RET ; Return to caller (jumps to address in R14)Example 5: Array-Like Storage Access
Section titled “Example 5: Array-Like Storage Access”Computing storage keys for array elements:
; Write value to storage[base + index].entry set_array_value
set_array_value: ; Array storage layout: base address + index = key LOADI R0, 1000 ; R0 = array base address in storage LOADI R1, 5 ; R1 = index (element 5) LOADI R2, 42 ; R2 = value to store
; Compute storage key ADD R3, R0, R1 ; R3 = key = base + index = 1005
; Store value SSTORE R3, R2 ; Storage[1005] = 42 HALT
; To read later:; LOADI R0, 1000; LOADI R1, 5; ADD R3, R0, R1; SLOAD R4, R3 ; R4 = Storage[1005] = 42Storage layout:
| Storage Key | Value | Meaning |
|---|---|---|
| 1000 | ? | array[0] |
| 1001 | ? | array[1] |
| 1002 | ? | array[2] |
| … | … | … |
| 1005 | 42 | array[5] (our write) |
Execution Trace Example
Section titled “Execution Trace Example”Let’s trace Example 1 (counter) through the VM:
Assembly:
main: LOADI R0, 0 SLOAD R1, R0 LOADI R2, 1 ADD R1, R1, R2 SSTORE R0, R1 HALTExecution trace:
| Step | PC | Instruction | R0 | R1 | R2 | Gas Left | Storage[0] |
|---|---|---|---|---|---|---|---|
| Init | 0x00 | — | 0 | 0 | 0 | 10000 | 5 |
| 1 | 0x00 | LOADI R0, 0 | 0 | 0 | 0 | 9998 | 5 |
| 2 | 0x0A | SLOAD R1, R0 | 0 | 5 | 0 | 9898 | 5 |
| 3 | 0x0C | LOADI R2, 1 | 0 | 5 | 1 | 9896 | 5 |
| 4 | 0x16 | ADD R1, R1, R2 | 0 | 6 | 1 | 9894 | 5 |
| 5 | 0x19 | SSTORE R0, R1 | 0 | 6 | 1 | 4894 | 6 |
| 6 | 0x1B | HALT | 0 | 6 | 1 | 4894 | 6 |
Result: Success, gas used = 5106, storage[0] incremented from 5 to 6
4.9 Macros and Constants
Section titled “4.9 Macros and Constants”To make assembly more maintainable, we support constants and (in the future) macros.
Constants (.const)
Section titled “Constants (.const)”Constants give meaningful names to magic numbers:
; Define constants.const MAX_SUPPLY 1000000.const OWNER_SLOT 0.const BALANCE_BASE 1000
.entry main
main: ; Load owner address from storage LOADI R0, OWNER_SLOT ; R0 = 0 (expands to storage slot 0) SLOAD R1, R0 ; R1 = Storage[0] = owner address
; Check if supply exceeds max LOADI R2, 100 ; Current supply storage slot SLOAD R3, R2 ; R3 = current supply LOADI R4, MAX_SUPPLY ; R4 = 1000000 (constant) GT R5, R3, R4 ; R5 = (supply > max) LOADI R6, error ; R6 = error handler address JUMPI R5, R6 ; If exceeded, jump to error
; Success path... HALT
error: REVERTWhy constants matter:
| Without Constants | With Constants |
|---|---|
LOADI R0, 0 | LOADI R0, OWNER_SLOT |
| What does 0 mean? | Clearly owner storage |
| Change requires find-replace | Change in one place |
Future: Macro Support
Section titled “Future: Macro Support”Planned .macro directive for code reuse:
; Define a macro for safe division.macro safe_div(result, dividend, divisor, error_label) ISZERO R15, \divisor ; Check divisor == 0 LOADI R14, \error_label ; Load error address JUMPI R15, R14 ; If zero, jump to error DIV \result, \dividend, \divisor ; Safe division.endmacro
; Usage:main: LOADI R0, 10 LOADI R1, 2 safe_div(R2, R0, R1, error) ; Expands to 4 instructions HALT
error: REVERTMacros are textual substitution — they expand before parsing. This is similar to C preprocessor macros.
4.10 Writing Efficient Assembly
Section titled “4.10 Writing Efficient Assembly”Gas costs add up quickly. Here are optimization techniques to write gas-efficient code.
Register Allocation
Section titled “Register Allocation”Principle: Registers are free, memory costs gas.
Bad (uses memory unnecessarily):
LOADI R0, 100LOADI R1, 0 ; Memory addressSTORE64 R1, R0 ; 3 gasLOAD64 R2, R1 ; 3 gasADD R2, R2, R0 ; Use the value; Total: 6 gas for temporary storageGood (reuses register):
LOADI R0, 100MOV R2, R0 ; 2 gas (or just use R0 directly)ADD R2, R2, R0; Total: 2 gasGas Cost Awareness
Section titled “Gas Cost Awareness”Remember the cost hierarchy:
| Operation | Cost | When to Use |
|---|---|---|
| Registers (ADD, MOV) | 2-3 gas | Always prefer |
| Memory (LOAD64, STORE64) | 3 gas | Temporary data |
| Storage read (SLOAD) | 100 gas | Read once, cache in register |
| Storage write (SSTORE) | 5,000-20,000 gas | Only when necessary |
Bad (repeated SLOAD):
LOADI R0, 0SLOAD R1, R0 ; 100 gasADD R1, R1, R2SLOAD R1, R0 ; 100 gas again!ADD R1, R1, R3; Total: 200 gas for storage readsGood (cache in register):
LOADI R0, 0SLOAD R1, R0 ; 100 gas (read once)ADD R1, R1, R2 ; Use cached valueADD R1, R1, R3 ; Use cached value againSSTORE R0, R1 ; 5000 gas (write once at the end); Total: 100 gas for storage reads (saved 100 gas!)Common Optimizations
Section titled “Common Optimizations”1. Loop Unrolling
If you know the iteration count is small, unroll the loop:
Before:
LOADI R0, 0 ; counterloop: ADD R1, R1, R2 ; accumulate ADDI R0, R0, 1 ; increment counter LT R3, R0, 3 ; counter < 3? LOADI R4, loop JUMPI R3, R4 ; loop; Cost: 3 iterations × (2 + 2 + 3 + 2 + 8) = 51 gasAfter (unrolled):
ADD R1, R1, R2 ; 2 gasADD R1, R1, R2 ; 2 gasADD R1, R1, R2 ; 2 gas; Cost: 6 gas (saved 45 gas!)2. Strength Reduction
Replace expensive operations with cheaper ones:
Multiply by 2 (expensive):
MUL R0, R0, 2 ; 3 gasShift left by 1 (cheaper):
SHL R0, R0, 1 ; 5 gas... wait, this is MORE expensive!Actually, for powers of 2, ADD R0, R0, R0 (2 gas) is cheapest!
3. Dead Code Elimination
Remove code that doesn’t affect the result:
Before:
LOADI R0, 100LOADI R1, 200 ; R1 is set but never usedADD R2, R0, R0After:
LOADI R0, 100ADD R2, R0, R0; Saved 2 gas by removing unused LOADI4.11 For Ethereum Developers
Section titled “4.11 For Ethereum Developers”If you’re coming from Ethereum/Solidity, here’s how our assembly compares to EVM assembly.
Stack-Based vs Register-Based
Section titled “Stack-Based vs Register-Based”EVM (Stack-Based):
PUSH1 0x03PUSH1 0x04ADD; Stack: [7]Minichain (Register-Based):
LOADI R0, 3LOADI R1, 4ADD R2, R0, R1; R2 = 7| Aspect | EVM | Minichain |
|---|---|---|
| Data Location | Stack (implicit) | Registers (explicit) |
| Instruction Count | More (push/pop overhead) | Fewer |
| Readability | Harder (mental stack tracking) | Easier (named registers) |
| Optimization | Limited (stack constraints) | Better (registers map to CPU) |
Why Register-Based?
Section titled “Why Register-Based?”- Clarity —
ADD R2, R0, R1is self-documenting. You see exactly where data comes from and goes to. - Efficiency — Fewer instructions = less bytecode = lower gas to deploy (in a system that charged for bytecode size).
- Familiarity — Most physical CPUs (x86, ARM, RISC-V) are register-based.
Opcode Mapping
Section titled “Opcode Mapping”| Operation | EVM | Minichain | Notes |
|---|---|---|---|
| Push constant | PUSH1 0x03 | LOADI R0, 3 | EVM has PUSH1-PUSH32 for different sizes |
| Add | ADD | ADD R2, R0, R1 | EVM pops 2, pushes 1. We specify operands. |
| Storage read | SLOAD | SLOAD Rdst, Rkey | Similar cost (~100 gas) |
| Storage write | SSTORE | SSTORE Rkey, Rval | Similar cost (~5000-20000 gas) |
| Memory read | MLOAD | LOAD64 Rdst, Raddr | Both access temporary memory |
| Conditional jump | JUMPI | JUMPI Rcond, Rtarget | Both jump if condition is true |
| Call function | CALL | CALL Rtarget | EVM’s CALL is more complex (external calls) |
Gas Cost Philosophy
Section titled “Gas Cost Philosophy”Both VMs follow the same principle: Storage >> Memory > Computation
| Cost Tier | EVM | Minichain | Reasoning |
|---|---|---|---|
| Tier 1: Cheap | ADD (3 gas) | ADD (2 gas) | CPU-bound |
| Tier 2: Medium | MLOAD (3 gas) | LOAD64 (3 gas) | RAM access |
| Tier 3: Expensive | SLOAD (100-2100 gas) | SLOAD (100 gas) | Disk I/O |
| Tier 4: Very Expensive | SSTORE (5000-20000 gas) | SSTORE (5000-20000 gas) | Persistent write |
4.12 Development Tools
Section titled “4.12 Development Tools”Practical advice for writing and debugging assembly.
Syntax Highlighting
Section titled “Syntax Highlighting”Assembly is more readable with syntax highlighting. Here’s a basic VSCode snippet:
{ "fileTypes": ["asm"], "name": "Minichain Assembly", "patterns": [ { "match": "\\b(ADD|SUB|MUL|DIV|MOD|LOADI|SLOAD|SSTORE|HALT|JUMP|JUMPI)\\b", "name": "keyword.control.asm" }, { "match": "\\bR([0-9]|1[0-5])\\b", "name": "variable.parameter.register.asm" }, { "match": ";.*$", "name": "comment.line.semicolon.asm" }, { "match": "^[a-zA-Z_][a-zA-Z0-9_]*:", "name": "entity.name.function.asm" }, { "match": "\\.[a-z]+", "name": "keyword.directive.asm" } ], "scopeName": "source.asm"}Vim users: Create ~/.vim/syntax/minichain.vim with similar rules.
Editor Integration
Section titled “Editor Integration”VSCode settings:
{ "files.associations": { "*.asm": "minichain-assembly" }, "editor.tabSize": 4, "editor.insertSpaces": true}Assembler CLI
Section titled “Assembler CLI”A command-line interface for the assembler:
# Assemble a contractminichain-asm assemble counter.asm -o counter.bin
# Show bytecode in hexminichain-asm assemble counter.asm --hex
# Disassemble bytecode back to assemblyminichain-asm disassemble counter.bin
# Assemble and run in VMminichain-asm run counter.asm --gas-limit 10000 --traceDebugging Workflow
Section titled “Debugging Workflow”- Write assembly in your editor with syntax highlighting
- Assemble and check for errors:
Terminal window minichain-asm assemble contract.asm - Run in VM with tracing enabled:
Terminal window minichain-asm run contract.asm --trace - Review execution trace to find issues:
0: PC=0000 LOADI gas=9998 R0=10 R1=0 R2=01: PC=000A LOADI gas=9996 R0=10 R1=20 R2=02: PC=0014 ADD gas=9994 R0=10 R1=20 R2=30
- Optimize hot paths (see section 4.10)
4.13 From Assembly to Execution
Section titled “4.13 From Assembly to Execution”Let’s see the complete pipeline from assembly source to VM execution.
The Complete Pipeline
Section titled “The Complete Pipeline”┌─────────────────┐│ Assembly Code │ counter.asm│ (human-written)│└────────┬────────┘ │ │ Lexer ▼┌─────────────────┐│ Token Stream │ [LoadI, Register(0), Comma, Number(10), ...]└────────┬────────┘ │ │ Parser ▼┌─────────────────┐│ AST (Tree) │ Program { statements: [...], entry: Some("main") }└────────┬────────┘ │ │ Compiler (Pass 1: Collect Labels) ▼┌─────────────────┐│ Symbol Table │ { "main": 0x0000, "loop": 0x0014, ... }└────────┬────────┘ │ │ Compiler (Pass 2: Emit Bytecode) ▼┌─────────────────┐│ Bytecode │ [0x70, 0x00, 0x0A, 0x00, ..., 0x00]│ (binary file) │└────────┬────────┘ │ │ VM Loader ▼┌─────────────────┐│ VM Memory │ Bytecode loaded at address 0│ │ PC = 0, Gas = limit, Registers = [0; 16]└────────┬────────┘ │ │ VM Executor (Fetch-Decode-Execute Loop) ▼┌─────────────────┐│ Execution │ Success/Revert, Gas Used, Logs, Storage Changes│ Result │└─────────────────┘Example: End-to-End
Section titled “Example: End-to-End”1. Assembly Source (counter.asm):
.entry main
main: LOADI R0, 0 SLOAD R1, R0 LOADI R2, 1 ADD R1, R1, R2 SSTORE R0, R1 HALT2. Compiled Bytecode (hex):
70 00 00 00 00 00 00 00 00 00 0050 1070 20 01 00 00 00 00 00 00 0010 11 2051 01003. VM Execution:
$ minichain-vm run counter.bin --gas-limit 10000 --trace4. Execution Trace:
0: PC=0x0000 LOADI R0, 0 | gas=9998 | R0=0 1: PC=0x000A SLOAD R1, R0 | gas=9898 | R0=0 R1=5 2: PC=0x000C LOADI R2, 1 | gas=9896 | R0=0 R1=5 R2=1 3: PC=0x0016 ADD R1, R1, R2 | gas=9894 | R0=0 R1=6 R2=1 4: PC=0x0019 SSTORE R0, R1 | gas=4894 | R0=0 R1=6 R2=1 5: PC=0x001B HALT | gas=4894 |
Execution Result: Status: Success Gas Used: 5106 / 10000 Storage Changes: Slot 0x0000: 5 → 65. Verification:
- Counter incremented from 5 to 6 ✓
- Gas consumption: 5106 (SLOAD 100 + SSTORE 5000 + others 6) ✓
- No errors ✓
Debugging Assembly with VM Tracer
Section titled “Debugging Assembly with VM Tracer”The VM tracer (from Chapter 3) is invaluable for debugging:
minichain-vm run contract.asm --trace --trace-storageTrace options:
--trace— Show every instruction executed--trace-storage— Show storage reads/writes--trace-memory— Show memory accesses--gas-report— Show gas consumption breakdown
Example trace output:
Step 15: PC=0x0042 JUMPI R5, R3 Condition: R5 = 1 (true) Target: R3 = 0x0014 → Taking jump to 0x0014
Step 16: PC=0x0014 SLOAD R7, R6 Key: R6 = 0x0000 Value: Storage[0x0000] = 42 Cost: 100 gas → R7 = 42This level of detail makes it easy to spot:
- Incorrect jump targets
- Wrong register operands
- Unexpected storage values
- Gas cost surprises
Summary
Section titled “Summary”We’ve built a complete assembler for our blockchain VM:
| Component | What It Does | Input | Output |
|---|---|---|---|
| Lexer | Tokenization | Assembly text | Token stream |
| Parser | Syntax analysis | Tokens | AST (Abstract Syntax Tree) |
| Compiler Pass 1 | Symbol collection | AST | Symbol table (labels → addresses) |
| Compiler Pass 2 | Code generation | AST + Symbols | Bytecode (binary) |
| Error Handler | Diagnostics | Parse/compile errors | Human-readable messages |
Design Decisions
Section titled “Design Decisions”| Decision | Rationale |
|---|---|
| Register-based syntax | Matches VM architecture, more readable than stack-based |
| Two-pass compilation | Handles forward label references elegantly |
| Logos for lexing | Compile-time DFA generation → fast, type-safe tokenization |
| Hand-written parser | Better error messages than parser generators |
| Line number tracking | Essential for debugging — tell users exactly where errors are |
| Symbolic labels | Humans think in names (“loop_start”), not addresses (0x0042) |
Key Takeaways
Section titled “Key Takeaways”- Assembly bridges the gap between human-readable code and machine-executable bytecode
- Lexer → Parser → Compiler is the standard compilation pipeline
- Two-pass compilation solves forward references without complex backpatching
- Good error messages are critical for developer experience
- Gas costs matter — write assembly with optimization in mind
- Debugging tools (tracers, disassemblers) are essential for development
Comparison: Manual vs Assembly
Section titled “Comparison: Manual vs Assembly”| Aspect | Manual Bytecode | Assembly Language |
|---|---|---|
| Readability | Hex dump (incomprehensible) | Mnemonics + labels (clear) |
| Maintainability | One change breaks everything | Add lines without renumbering |
| Error Detection | Silent corruption | Compile-time checks |
| Documentation | Impossible to comment | Inline comments with ; |
| Development Speed | Hours per 10 lines | Minutes per 100 lines |
What’s Next?
Section titled “What’s Next?”With a working assembler, we can now write smart contracts in readable assembly. In Chapter 5: Blockchain Layer, we’ll build the chain logic that:
- Validates and executes transactions
- Runs contract bytecode in the VM
- Updates state via the storage layer
- Builds blocks with transaction batches
- Maintains consensus with validators
The assembler provides the tooling to write contracts. The blockchain layer will provide the environment to run them in a decentralized system.