Architecture¶
This document explains the internal architecture of the RuneScript Compiler, detailing the compilation pipeline from source code to bytecode execution.
Compiler Pipeline¶
The RuneScript Compiler follows a traditional multi-stage compilation process:
1. Lexer (Tokenization)¶
The lexer, implemented in Lexer.java, converts the source code character stream into a sequence of tokens. Each token represents a meaningful unit of the language such as keywords, identifiers, literals, operators, and punctuation.
Input: Source code as a string
Output: Stream of Token objects
Key responsibilities: - Identifying and categorizing tokens - Handling string literals and escape sequences - Managing line and column tracking for error reporting - Skipping whitespace and comments
2. Parser (Syntax Analysis)¶
The parser, implemented in Parser.java, takes the token stream and builds an Abstract Syntax Tree (AST) representing the syntactic structure of the program. The parser follows recursive descent parsing techniques with Pratt parsing for expression handling.
Input: Stream of Token objects
Output: Abstract Syntax Tree (AST) as Expr and Stmt objects
Key responsibilities: - Building AST nodes for expressions and statements - Enforcing grammatical rules - Error recovery and reporting - Handling operator precedence and associativity
3. Type Checker (Semantic Analysis)¶
The type checker, implemented in Resolver.java, performs semantic analysis on the AST to ensure type safety and correctness. It resolves variable scopes, validates type compatibility, and identifies semantic errors.
Input: Abstract Syntax Tree Output: Type-checked AST with symbol table information
Key responsibilities: - Variable and function declaration resolution - Scope management - Type inference and checking - Detection of semantic errors (e.g., undefined variables, type mismatches)
4. Bytecode Emitter¶
The bytecode emitter, implemented in BytecodeEmitter.java, translates the type-checked AST into RuneScript bytecode. This intermediate representation is designed for efficient execution by the virtual machine.
Input: Type-checked AST
Output: Bytecode instructions stored in Chunk.java
Key responsibilities: - Converting AST nodes to bytecode instructions - Managing constant pools - Optimizing instruction sequences - Generating debug information
5. Virtual Machine¶
The virtual machine, implemented in Interpreter.java and Chunk.java, executes the generated bytecode. It manages the runtime stack, heap allocation, and program execution flow.
Input: Bytecode chunks Output: Program execution results
Key responsibilities: - Instruction execution - Memory management - Runtime error handling - Native function integration
Core Components¶
Token.java¶
Represents individual lexical units with: - Token type (keyword, identifier, number, etc.) - Lexeme (actual text) - Line and column position - Literal value (for numbers, strings)
AST Classes¶
Abstract syntax tree nodes represent the program structure:
- Expr classes for expressions (binary, unary, literals, variables)
- Stmt classes for statements (expressions, prints, conditionals, loops)
Chunk.java¶
Manages bytecode storage with: - Instruction array - Constant pool - Line number mapping for debugging
Instruction.java¶
Defines the bytecode instruction set with opcodes and operands.
Compilation Process Flow¶
- Initialization: The compiler initializes all components and reads the source file
- Lexical Analysis: The lexer tokenizes the source code
- Syntax Analysis: The parser builds the AST from tokens
- Semantic Analysis: The resolver validates types and scopes
- Code Generation: The emitter generates bytecode from the AST
- Execution: The interpreter executes the bytecode
Error Handling¶
The compiler provides comprehensive error handling throughout the pipeline: - Lexical errors (invalid characters, unclosed strings) - Syntax errors (unexpected tokens, malformed expressions) - Semantic errors (undefined variables, type mismatches) - Runtime errors (division by zero, null pointer access)
Errors are reported with line and column numbers to help developers locate issues quickly.
Performance Considerations¶
The compiler is designed with performance in mind: - Efficient tokenization algorithms - Optimized AST representation - Bytecode optimization passes - Fast virtual machine execution