At the very top level of abstraction I will need to implement three modules.

Module Takes Returns
Lexer Text (Code) Tokens
Parser Tokens AST (Abstract Syntax Tree)
Compiler AST JVM Bytecode

###Lexer Lexer takes simple text input and tokenizes it. The code is no longer a meaningless stream of bytes, but a list of tokens. Tokens are also associated with type useful for further analysis. ###Parser The tokens are passed to parser which is responsible for organizaing tokens into hierarchical structure called Abstract Syntax Tree. The tree determines the order in which code should be executed. ###Compiler Compiler traverses the tree and maps it into valid bytecode instructions.

##Example

Let’s assume I’d like to execute int x=a*5+2; expression. The following steps need to be taken:

graph LR A[" Input Code

int x=a*5+2; "] B[" Tokens

{type,int},
{identifier,x}
{operator,=}
{identifier,a}
{operator,#42;}
{number,5}
{operator,+}
{number,2}
{keyword,;} "] A-->|LEXER|B B-->|PARSER|EQUALS subgraph Abstract Syntax Tree EQUALS["="] VARX["x"] VARA["a"] MULTIPLY["#42;"] PLUS["+"] FIVE[5] TWO[2] EQUALS---PLUS EQUALS---VARX PLUS---TWO PLUS---MULTIPLY MULTIPLY---FIVE MULTIPLY---VARA end

Once the abstract tree is created it needs to be mapped to bytecode by compiler.


Jakub Dziworski

Dev Blog