Architecture¶

This document explains the internal architecture of the RuneScript Compiler, detailing the compilation pipeline from source code to bytecode execution.

Compiler Pipeline¶

The RuneScript Compiler follows a traditional multi-stage compilation process:

Source Code (.rn) → Lexer → Parser → Type Checker → Bytecode Emitter → Virtual Machine

1. Lexer (Tokenization)¶

The lexer, implemented in Lexer.java, converts the source code character stream into a sequence of tokens. Each token represents a meaningful unit of the language such as keywords, identifiers, literals, operators, and punctuation.

Input: Source code as a string Output: Stream of Token objects

Key responsibilities: - Identifying and categorizing tokens - Handling string literals and escape sequences - Managing line and column tracking for error reporting - Skipping whitespace and comments

2. Parser (Syntax Analysis)¶

The parser, implemented in Parser.java, takes the token stream and builds an Abstract Syntax Tree (AST) representing the syntactic structure of the program. The parser follows recursive descent parsing techniques with Pratt parsing for expression handling.

Input: Stream of Token objects Output: Abstract Syntax Tree (AST) as Expr and Stmt objects

Key responsibilities: - Building AST nodes for expressions and statements - Enforcing grammatical rules - Error recovery and reporting - Handling operator precedence and associativity

3. Type Checker (Semantic Analysis)¶

The type checker, implemented in Resolver.java, performs semantic analysis on the AST to ensure type safety and correctness. It resolves variable scopes, validates type compatibility, and identifies semantic errors.

Input: Abstract Syntax Tree Output: Type-checked AST with symbol table information

Key responsibilities: - Variable and function declaration resolution - Scope management - Type inference and checking - Detection of semantic errors (e.g., undefined variables, type mismatches)

4. Bytecode Emitter¶

The bytecode emitter, implemented in BytecodeEmitter.java, translates the type-checked AST into RuneScript bytecode. This intermediate representation is designed for efficient execution by the virtual machine.

Input: Type-checked AST Output: Bytecode instructions stored in Chunk.java

Key responsibilities: - Converting AST nodes to bytecode instructions - Managing constant pools - Optimizing instruction sequences - Generating debug information

5. Virtual Machine¶

The virtual machine, implemented in Interpreter.java and Chunk.java, executes the generated bytecode. It manages the runtime stack, heap allocation, and program execution flow.

Input: Bytecode chunks Output: Program execution results

Key responsibilities: - Instruction execution - Memory management - Runtime error handling - Native function integration

Core Components¶

Token.java¶

Represents individual lexical units with: - Token type (keyword, identifier, number, etc.) - Lexeme (actual text) - Line and column position - Literal value (for numbers, strings)

AST Classes¶

Abstract syntax tree nodes represent the program structure: - Expr classes for expressions (binary, unary, literals, variables) - Stmt classes for statements (expressions, prints, conditionals, loops)

Chunk.java¶

Manages bytecode storage with: - Instruction array - Constant pool - Line number mapping for debugging

Instruction.java¶

Defines the bytecode instruction set with opcodes and operands.

Compilation Process Flow¶

Initialization: The compiler initializes all components and reads the source file
Lexical Analysis: The lexer tokenizes the source code
Syntax Analysis: The parser builds the AST from tokens
Semantic Analysis: The resolver validates types and scopes
Code Generation: The emitter generates bytecode from the AST
Execution: The interpreter executes the bytecode

Error Handling¶

The compiler provides comprehensive error handling throughout the pipeline: - Lexical errors (invalid characters, unclosed strings) - Syntax errors (unexpected tokens, malformed expressions) - Semantic errors (undefined variables, type mismatches) - Runtime errors (division by zero, null pointer access)

Errors are reported with line and column numbers to help developers locate issues quickly.

Performance Considerations¶

The compiler is designed with performance in mind: - Efficient tokenization algorithms - Optimized AST representation - Bytecode optimization passes - Fast virtual machine execution