Creating JVM language [PART 13] - For Loops

Sources

The project can be cloned from github repository.
The revision described in this post is ebd36ca9f8af03ce4b9c144efab0ad11cc99f749.

Ranged loops

In this post I am going to describe ‘ranged for loops’. The ranged for loops iterate value within specified range. In Java range loop can look like this:

for (int i=0;i<=5;i++)

Enkel’s equivalent would be:

for i from 0 to 5

I also implemented additional feature. The loops are aware whether they should decrement or increment:

for i from 0 to 5 //increment i from 0 to 5  - for (int i=0;i<=5;i++)

for i from 5 to 0 //decremenet i from 5 to 0 - for (int i=5;i>=0;i--)

The loop type (incremented,decremented) must be inferred at runtime, because the ranges values can be results of method calls.

The concept for while loops and collections loops ( for ( item : collection) ) is very simmilar . It is not described in this post to make it as short as possible.

Grammar changes

statement : block
           //other statement alternatives
           | forStatement ;

forStatement : 'for' ('(')? forConditions (')')? statement ;
forConditions : iterator=varReference  'from' startExpr=expression range='to' endExpr=expression ;
  • forConditions are conditions (bounds) for the iterator (from i 0 to 10 ).
  • Labeling rules with = is going to improve readability of the parser.
  • the iterator must be a name of the variable (the var may not exist in the scope. In this case the variable is declared behind the scenes)
  • The startExpression’s value is used for initializing the iterator.
  • The endExpressions’s value is the stop value for the iterator.

The result parse tree for the statement:

 for (i from 0 to 5) print i

is:

for parse tree

Mapping antlr context objects

The antlr generates ForStatementContext class from the grammar specification. It is good idea to map it into more compiler-friendly class. While mapping why not solve the problem described in the previous section (undeclared iterator variable)?

public class ForStatementVisitor extends EnkelBaseVisitor<RangedForStatement> {

    //other stuff
    
    @Override
    public RangedForStatement visitForStatement(@NotNull ForStatementContext ctx) {
        EnkelParser.ForConditionsContext forExpressionContext = ctx.forConditions();
        Expression startExpression = forExpressionContext.startExpr.accept(expressionVisitor);
        Expression endExpression = forExpressionContext.endExpr.accept(expressionVisitor);
        VarReferenceContext iterator = forExpressionContext.iterator;
        String varName = iterator.getText();
        //If variable referenced by iterator already exists in the scope
        if(scope.localVariableExists(varName)) { 
            //register new variable value
            Statement iteratorVariable = new AssignmentStatement(varName, startExpression); 
            //get the statement (usually block))
            Statement statement = ctx.statement().accept(statementVisitor); 
            return new RangedForStatement(iteratorVariable, startExpression, endExpression,statement, varName, scope); 
        //Variable has not been declared in the scope
        } else { 
            //create new local variable and add to the scope
            scope.addLocalVariable(new LocalVariable(varName,startExpression.getType())); 
            //register variable declaration statement
            Statement iteratorVariable = new VariableDeclarationStatement(varName,startExpression); 
            Statement statement = ctx.statement().accept(statementVisitor);
            return new RangedForStatement(iteratorVariable, startExpression, endExpression,statement, varName,scope);
        }
    }
}

The iterator variable may or may not exist in the scope. Both statements below should be handled:

    var iterator = 0
    for (iterator from 0 to 5) print iterator

Iterator was already declared. Assign it to the the startExpression (value 0) : new AssignmentStatement(varName, startExpression);.

    for (iterator from 0 to 5) print iterator

Iterator is not yet declared. Declare and assign it to the startExpression (value 0) : new VariableDeclarationStatement(varName,startExpression);.

Generating bytecode

Once the RangedForStatement has been created it is time to pull some information from it and generate bytecode.

There are no special jvm instructions for loops. One way to do that is to use control flow (conditional and unconditional) instructions (described in Creating JVM language [PART 10] - Conditional statements).

public void generate(RangedForStatement rangedForStatement) {
    Scope newScope = rangedForStatement.getScope();
    StatementGenerator scopeGeneratorWithNewScope = new StatementGenerator(methodVisitor, newScope);
    ExpressionGenrator exprGeneratorWithNewScope = new ExpressionGenrator(methodVisitor, newScope);
    Statement iterator = rangedForStatement.getIteratorVariableStatement();
    Label incrementationSection = new Label();
    Label decrementationSection = new Label();
    Label endLoopSection = new Label();
    String iteratorVarName = rangedForStatement.getIteratorVarName();
    Expression endExpression = rangedForStatement.getEndExpression();
    Expression iteratorVariable = new VarReference(iteratorVarName, rangedForStatement.getType());
    ConditionalExpression iteratorGreaterThanEndConditional = new ConditionalExpression(iteratorVariable, endExpression, CompareSign.GREATER);
    ConditionalExpression iteratorLessThanEndConditional = new ConditionalExpression(iteratorVariable, endExpression, CompareSign.LESS);

    //generates varaible declaration or variable reference (istore)
    iterator.accept(scopeGeneratorWithNewScope);

    //Section below checks whether the loop should be iterating or decrementing
    //If the range start is smaller than range end (i from 0 to 5)  then iterate (++)
    //If the range start is greater than range end (i from 5 to 0) then decrement (--)

    //Pushes 0 or 1 onto the stack 
    iteratorLessThanEndConditional.accept(exprGeneratorWithNewScope);
    //IFNE - is value on the stack (result of conditional) different than 0 (success)?
    methodVisitor.visitJumpInsn(Opcodes.IFNE,incrementationSection);

    iteratorGreaterThanEndConditional.accept(exprGeneratorWithNewScope);
    methodVisitor.visitJumpInsn(Opcodes.IFNE,decrementationSection);

    //Incrementation section
    methodVisitor.visitLabel(incrementationSection);
    rangedForStatement.getStatement().accept(scopeGeneratorWithNewScope); //execute the body
    methodVisitor.visitIincInsn(newScope.getLocalVariableIndex(iteratorVarName),1); //increment iterator
    iteratorGreaterThanEndConditional.accept(exprGeneratorWithNewScope); //is iterator greater than range end?
    methodVisitor.visitJumpInsn(Opcodes.IFEQ,incrementationSection); //if it is not go back loop again 
    //the iterator is greater than end range. Break out of the loop, skipping decrementation section
    methodVisitor.visitJumpInsn(Opcodes.GOTO,endLoopSection); 

    //Decrementation section
    methodVisitor.visitLabel(decrementationSection);
    rangedForStatement.getStatement().accept(scopeGeneratorWithNewScope);
    methodVisitor.visitIincInsn(newScope.getLocalVariableIndex(iteratorVarName),-1); //decrement iterator
    iteratorLessThanEndConditional.accept(exprGeneratorWithNewScope);
    methodVisitor.visitJumpInsn(Opcodes.IFEQ,decrementationSection);

    methodVisitor.visitLabel(endLoopSection);
}

This may seem a little bit complicated because the decision whether the loop should be incremented or decremented needs to be taken at runtime.

Let’s analyze how the method actually choose the right iteration type in this example for (i from 0 to 5):

  1. Declare iterator varaible i and assign start value (0).
  2. Check if iterator value (0) is less than end range value (5)
  3. Because the 0 (range start) is less than 5 (range end) the iterator should be incremented. Jump to incrementation section.
  4. Execute the actual statements in the loop.
  5. increment iterator by 1
  6. Check if iterator is greater than range end (5).
  7. If it is not then go back to the point 4.
  8. Once the loop has been executed 5 times (the iterator is 6) go to end section (skip decrementation section)

Example

Let’s compile the following Enkel class:

Loops {
    main(string[] args) {
        for i from 1 to 5 {
            print i
        }
    }
}

To better present how the iteration type is inferred I decompiled the Enkel.class file using Intellij Idea’s decompiler:

//Enkel.class file decompiled to Java using Intellij Idea's decompiler

public class Loops {
    public static void main(String[] var0) {
        int var1 = 1;
        if(var1 >= 5 ) { //should it be decremented?
            do {
                System.out.println(var1);
                --var1;
            } while(var1 >= 5);
        } else { //should it be incremented?
            do {
                System.out.println(var1);
                ++var1;
            } while(var1 <= 5);
        }

    }
}

The result is obviously :

kuba@kuba-laptop:~/repos/Enkel-JVM-language$ java Loops 
1
2
3
4
5

Creating JVM language [PART 12] - Named Function Arguments

Sources

The project can be cloned from github repository.
The revision described in this post is 62a99fe34f540f5cae7a48386b66d23e4b879046.

Why do I need named arguments?

In java (like in most languages) the method call arguments are identified by indexes. This seems reasonable for a methods with small amount of parameters and preferably different types. Unfortunately there are many methods that neither have small amount of parameters nor different types.

If you’ve ever done some game programing you proably came across functions like this:

Rect createRectangle(int x1,int y1,int x2, int y2) //createRectangle signature

I am more than sure you called it with wrong arguments order at least once.

Do you see the problem? The function has plenty parameters each the same type. It is very easy to forget what is the order - the compiler doesn’t care as long as types match.

Wouldn’t it be awesome if you could explicitly specify a parameter without relying on the indexes? That’s where named arguments come in:

createRectangle(25,50,-25,-50) //method invokation without named parameters :(
createRectangle(x1->25,x2->-25,y1->50,y2->-50) //method invokation with named parameters :)

The benefits from using named arguments are:

  • The order of arguments is unrestricted
  • The code is more readable
  • No need to jump between files to compare call with signature

Grammar changes

functionCall : functionName '('argument? (',' argument)* ')';
argument : expression              //unnamed argument
         | name '->' expression   ; //named argument

The function call can have one, or more (splitted by ‘,’ character) arguments. The rule argument comes in two flavours (unnamed and named). Mixing named and unnamed arguments is not allowed.

Reordering arguments

As described in Creating JVM language [PART 7] - Methods , method parsing process is divided into two steps. First it finds all the signatures (declarations), and once it’s done it starts parsing the bodies. It is guaranteed that during parsing method bodies all the signatures are already available.

Using that characteristics the idea is to “transform” named call to unnamed call by getting parameters indexes from signature:

  • Look for a parameter name in the signature that matches the argument name
  • Get parameter index
  • If the argument is at different index than a parameter reorder it.

Reordering arguments

In the example above the x2 would be swapped with y1.

public class ExpressionVisitor extends EnkelBaseVisitor<Expression> {
    //other stuff
    @Override
    public Expression visitFunctionCall(@NotNull EnkelParser.FunctionCallContext ctx) {
        String funName = ctx.functionName().getText();
        FunctionSignature signature = scope.getSignature(funName); 
        List<EnkelParser.ArgumentContext> argumentsCtx = ctx.argument();
        //Create comparator that compares arguments based on their index in signature
        Comparator<EnkelParser.ArgumentContext> argumentComparator = (arg1, arg2) -> {
            if(arg1.name() == null) return 0; //If the argument is not named skip
            String arg1Name = arg1.name().getText();
            String arg2Name = arg2.name().getText();
            return signature.getIndexOfParameter(arg1Name) - signature.getIndexOfParameter(arg2Name);
        };
        List<Expression> arguments = argumentsCtx.stream() //parsed arguments (wrong order)
                .sorted(argumentComparator) //Order using created comparator
                .map(argument -> argument.expression().accept(this)) //Map parsed arguments into expressions
                .collect(toList());
        return new FunctionCall(signature, arguments);
    }
}

That way the component responsible for generting bytecode does not distinct named and unnamed arguments. It only sees FunctionCall as a collection of arguments (properly ordered) and a signature. No modifications to bytecode generation are therefore needed.

Example

The following Enkel class:

NamedParamsTest {

    main(string[] args) {
        createRect(x1->25,x2->-25,y1->50,y2->-50)
    }

    createRect (int x1,int y1,int x2, int y2) {
        print "Created rect with x1=" + x1 + " y1=" + y1 + " x2=" + x2 + " y2=" + y2
    }
}

gets compiled into following bytecode:

kuba@kuba-laptop:~/repos/Enkel-JVM-language$ javap -c NamedParamsTest.class 
public class NamedParamsTest {
  public static void main(java.lang.String[]);
    Code:
       0: bipush        25          //x1 (1 index in call)
       2: bipush        50          //y1 (3 index in call)
       4: bipush        -25         //x2 (2 index in call)
       6: bipush        -50         //y2 (4 index in call)
       8: invokestatic  #10                 // Method createRect:(IIII)V
      11: return

  public static void createRect(int, int, int, int);
    Code:
      //normal printing code 
}

As you can see the y1 and x2 arguments were swapped as expected.

The output is:

Created rect with x1=25 y1=50 x2=-25 y2=-50

Creating JVM language [PART 11] - Default parameters

Sources

The project can be cloned from github repository.
The revision described in this post is 0d39a48855e15f5146bfa8ddee40effe84cf1093.

Java and default parameters

The absence of default parameters is one of the things I always hated in java. Some people suggest using builder pattern but this solution leads to lots of boilerplate code. I have no idea why java team have neglected this feature for so long. It actually turns out that it is not that hard to implement.

Argument vs Parameter

Those terms are often mixed but actually they have different meaning. The simples way to put it is:

  • parameter - method signature
  • argument - method call

An argument is an expression passed when calling the method. A parameter is a variable in method signature.

Concept

The idea is to look up method signature during call and get the parameter’s default value from it. That way no modifications regarding bytecode are required. The function call just “simulates” a default value as if it was passed explicitly.

Grammar changes

There is only one minor change to the functionParameterRule:

functionParameter : type ID functionParamdefaultValue? ;
functionParamdefaultValue : '=' expression ;

The function parameter consists of type, followed by name. Optionally ( ‘?’ ) it is followed by equals sign (‘=’) followed by some by default expression.

Mapping antlr context objects

Changes in this section are minor. New field (defaulValue) was introduced into FunctionParameter class.
The field stores Optional<Expression> object. If the parser founds defaultValue then the Optional consits of this value. Otherwise the Optional is empty.

public class FunctionSignatureVisitor extends EnkelBaseVisitor<FunctionSignature> {

    @Override
    public FunctionSignature visitFunctionDeclaration(@NotNull EnkelParser.FunctionDeclarationContext ctx) {
       //other stuff
        for(int i=0;i<argsCtx.size();i++) { //for each parsed argument
        //other stuff
            Optional<Expression> defaultValue = getParameterDefaultValue(argCtx);
            FunctionParameter functionParameters = new FunctionParameter(name, type, defaultValue);
            parameters.add(functionParameters);
        }
        //other stuff
    }

    private Optional<Expression> getParameterDefaultValue(FunctionParameterContext argCtx) {
        if(argCtx.functionParamdefaultValue() != null) {
            EnkelParser.ExpressionContext defaultValueCtx = argCtx.functionParamdefaultValue().expression();
            return Optional.of(defaultValueCtx.accept(expressionVisitor));
        }
        return Optional.empty();
    }
}

Generating bytecode

Generating bytecode for function call has to additionally following steps:

  • Check if there are not more arguments (method call) than parameters (method signature)
  • Get and evaluate default expressions for the arguments that are missing

The ‘missing’ arguments are defined as arguments at index between last index in signature (exclusive) and last index in function call (inclusive)

Example:

signature: fun(int x,int x2=5,int x3=4)

call: fun(2)

Missing arguments are x2(index 1) and x3(index 2) because last index in call is 0 and last index in signature is 2.

public class ExpressionGenrator {
    public void generate(FunctionCall functionCall) {
        //other stuff
        if(arguments.size() > parameters.size()) {  
            throw new BadArgumentsToFunctionCallException(functionCall);
        }
        arguments.forEach(argument -> argument.accept(this));
        for(int i=arguments.size();i<parameters.size();i++) {
            Expression defaultParameter = parameters.get(i).getDefaultValue()
                    .orElseThrow(() -> new BadArgumentsToFunctionCallException(functionCall));
            defaultParameter.accept(this);
        }
        //other stuff   
    }
}

Example

The following Enkel class:

DefaultParamTest {

    main(string[] args) {
         greet("andrew")
         print ""
         greet("kuba","enkel")
    }

    greet (string name,string favouriteLanguage="java") {
        print "Hello my name is "
        print name
        print "and my favourite langugage is "
        print favouriteLanguage
    }
}

gets compiled into following bytecode:

kuba@kuba-laptop:~/repos/Enkel-JVM-language$ javap -c DefaultParamTest
public class DefaultParamTest {
  public static void main(java.lang.String[]);
    Code:
       0: ldc           #8                  //push String "andrew" onto the stack
       2: ldc           #10   // push String "java" onto the stack  <-- implicit argument value
       4: invokestatic  #14                 // invoke static method greet:(Ljava/lang/String;Ljava/lang/String;)V
       7: getstatic     #20                 // get static field java/lang/System.out:Ljava/io/PrintStream;
      10: ldc           #22                 // push  empty String (empty line)
      12: invokevirtual #27                 // call Method "Ljava/io/PrintStream;".println:(Ljava/lang/String;)V to print empty line
      15: ldc           #29                 // push String "kuba"
      17: ldc           #31   // push String "enkel" <-- explicit argument value
      19: invokestatic  #14                 //invoke static method greet:(Ljava/lang/String;Ljava/lang/String;)V
      22: return

  public static void greet(java.lang.String, java.lang.String);
    Code:
       0: getstatic     #20                 // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #33                 // String Hello my name is
       5: invokevirtual #27                 // Method "Ljava/io/PrintStream;".println:(Ljava/lang/String;)V
       8: getstatic     #20                 // Field java/lang/System.out:Ljava/io/PrintStream;
      11: aload_0                           // load (push onto stack) variable at index 0 (first parameter of a method)
      12: invokevirtual #27                 // Method "Ljava/io/PrintStream;".println:(Ljava/lang/String;)V
      15: getstatic     #20                 // Field java/lang/System.out:Ljava/io/PrintStream;
      18: ldc           #35                 // String and my favourite langugage is
      20: invokevirtual #27                 // Method "Ljava/io/PrintStream;".println:(Ljava/lang/String;)V
      23: getstatic     #20                 // Field java/lang/System.out:Ljava/io/PrintStream;
      26: aload_1                           // load (push onto stack) variable at index 1 (second parameter of a method)
      27: invokevirtual #27                 // Method "Ljava/io/PrintStream;".println:(Ljava/lang/String;)V
      30: return
}

and the output is:

Hello my name is 
andrew
and my favourite langugage is 
java

Hello my name is 
kuba
and my favourite langugage is 
enkel

Creating JVM language [PART 10] - Conditional statements

Sources

The project can be cloned from github repository.
The revision described in this post is 7c8e6ea934b6d3172e3c142c658e9c4287287de3.

Grammar changes

Implementing conditional statements resulted in two grammar changes:

  • introdcuing a new rule ifStatement.
  • adding conditionalExpressions alternatives to the expression rule.
ifStatement :  'if'  '('? expression ')'? trueStatement=statement ('else' falseStatement=statement)?;
expression : varReference #VARREFERENCE
           | value        #VALUE
           //other expression alternatives
           | expression cmp='>' expression #conditionalExpression
           | expression cmp='<' expression #conditionalExpression
           | expression cmp='==' expression #conditionalExpression
           | expression cmp='!=' expression #conditionalExpression
           | expression cmp='>=' expression #conditionalExpression
           | expression cmp='<=' expression #conditionalExpression
           ;

ifStatement rule definition basically means:

  • expression is a condition to be tested.
  • it is not required to place condition in parenthesis - '('? ')'?- question marks mean “optional”.
  • trueStatement is meant to be evaluated when the condition is true.
  • the if can be followed by an else.
  • falseStatement is meant to be evaluated when the condition is false.
  • ifStatement is a statement too so it can be used in trueStatement or falseStatement (if … else if … else ).

New expression alternatives are pretty much self explanatory. Their purpose is to compare two expressions and return another expression (boolean value).

To better understand how the ‘if’ and ‘else’ can be used to specify ‘else if’ take a look at following snippet:

    if(0) {
        
    } else if(1) {
        
    }

The code is parsed to following parse tree:

Parse Tree

As you can see the second if is actually a child of else. They are on the different level in hierarchy. There is no need to specify ‘else if’ in rule explicitly. ifstatement rule is actually a statement rule too so other ifStatements can be used inside ifStatement. This provides a way to chain them easily.

Mapping antlr context objects

Antlr autogenerated IfStatementContext objects are converted into POJO IfStatement objects:

public class StatementVisitor extends EnkelBaseVisitor<Statement> {
    //other stuff
    @Override
    public Statement visitIfStatement(@NotNull EnkelParser.IfStatementContext ctx) {
        ExpressionContext conditionalExpressionContext = ctx.expression();
        Expression condition = conditionalExpressionContext.accept(expressionVisitor); //Map conditional expression
        Statement trueStatement = ctx.trueStatement.accept(this); //Map trueStatement antlr object
        Statement falseStatement = ctx.falseStatement.accept(this); //Map falseStatement antlr object

        return new IfStatement(condition, trueStatement, falseStatement);
    } 
}

Conditional Expressions on the other hand are mapped like this:

public class ExpressionVisitor extends EnkelBaseVisitor<Expression> {
    @Override
    public ConditionalExpression visitConditionalExpression(@NotNull EnkelParser.ConditionalExpressionContext ctx) {
        EnkelParser.ExpressionContext leftExpressionCtx = ctx.expression(0); //get left side expression ( ex. 1 < 5  -> it would mean get "1")
        EnkelParser.ExpressionContext rightExpressionCtx = ctx.expression(1); //get right side expression
        Expression leftExpression = leftExpressionCtx.accept(this); //get mapped (to POJO) left expression using this visitor
        //rightExpression might be null! Example: 'if (x)' checks x for nullity. The solution for this case is to assign integer 0 to the rightExpr 
        Expression rightExpression = rightExpressionCtx != null ? rightExpressionCtx.accept(this) : new Value(BultInType.INT,"0"); 
        CompareSign cmpSign = ctx.cmp != null ? CompareSign.fromString(ctx.cmp.getText()) : CompareSign.NOT_EQUAL; //if there is no cmp sign use '!=0' by default
        return new ConditionalExpression(leftExpression, rightExpression, cmpSign);
    }
}

CompareSign is an object representing comparing sign (‘==”, ‘<’ etc.). It also stores appropriate bytecode instruction name for comparison (IF_ICMPEQ,IF_ICMPLE etc.)

Generating bytecode

The jvm has few groups of conditional instructions for conditional branching:

  • if<eq,ne,lt,le,gt,ge> - pops one value from the stack and comparse it to 0.
  • if_icmp_<eq,ne,lt,le,gt,ge> - pops two values from stack and compares them to each other.
  • ifs for other primitive types (lcmp - long ,fcmpg - float etc.)
  • if[non]null - checks for null

For now we’re just going to use second group. The instructions take operand which is a branchoffset (the instruction to which proceed if the condition is met).

Generating ConditionalExpression

The first place the ifcmpne (compare two values for ‘not equal’ test) instruction is going to be used is for generating bytecode is ConditionalExpression:

public void generate(ConditionalExpression conditionalExpression) {
    Expression leftExpression = conditionalExpression.getLeftExpression();
    Expression rightExpression = conditionalExpression.getRightExpression();
    Type type = leftExpression.getType();
    if(type != rightExpression.getType()) {
        throw new ComparisonBetweenDiferentTypesException(leftExpression, rightExpression); //not yet supported
    }
    leftExpression.accept(this);
    rightExpression.accept(this);
    CompareSign compareSign = conditionalExpression.getCompareSign();
    Label trueLabel = new Label(); //represents an adress in code (to which jump if condition is met)
    Label endLabel = new Label();
    methodVisitor.visitJumpInsn(compareSign.getOpcode(),trueLabel);
    methodVisitor.visitInsn(Opcodes.ICONST_0);
    methodVisitor.visitJumpInsn(Opcodes.GOTO, endLabel);
    methodVisitor.visitLabel(trueLabel);
    methodVisitor.visitInsn(Opcodes.ICONST_1);
    methodVisitor.visitLabel(endLabel);
}

compareSign.getOpcode() - returns the instruction for condition:

public enum CompareSign {
    EQUAL("==", Opcodes.IF_ICMPEQ),
    NOT_EQUAL("!=", Opcodes.IF_ICMPNE),
    LESS("<",Opcodes.IF_ICMPLT),
    GREATER(">",Opcodes.IF_ICMPGT),
    LESS_OR_EQUAL("<=",Opcodes.IF_ICMPLE),
    GRATER_OR_EQAL(">=",Opcodes.IF_ICMPGE);
    //getters
}

Conditional instructions take operand which is a branchoffset (label). Two values currently sitting at top of the stack are poped and compared using compareSign.getOpcode().

If the comparision is positive then the jump is performed to trueLabel. The truLabel instructions consist of

This means pushing int 1 onto the stack.

If the comparison is negative no jump is performed. 
Instead the next instruction is invoked (ICONST_0 - push 0 onto the stack).
Afterwards the GOTO (unconditional branching instruction)  is performed to jump to endLabel.
That way the code responsible for positive comparision is bypassed.

Performing comparison in the manner described above guarantees that the result would be
1 or 0 (int value pushed onto the stack).

That way the conditonalExpression can be used as an expression - it can be assigned to a variable,
passed as argument to a function,printed or even returned.

### Generating IfStatement

```java
 public void generate(IfStatement ifStatement) {
        Expression condition = ifStatement.getCondition();
        condition.accept(expressionGenrator);
        Label trueLabel = new Label();
        Label endLabel = new Label();
        methodVisitor.visitJumpInsn(Opcodes.IFNE,trueLabel);
        ifStatement.getFalseStatement().accept(this);
        methodVisitor.visitJumpInsn(Opcodes.GOTO,endLabel);
        methodVisitor.visitLabel(trueLabel);
        ifStatement.getTrueStatement().accept(this);
        methodVisitor.visitLabel(endLabel);
    }

The IfStatement relies on a concept used by ConditionalExpression - it guarantees that the 0 or 1 is pushed onto the stack as result of generating.

It simply evaluates expression (condition.accept(expressionGenrator);) and checks if the value it pushed onto the stack is != 0 (methodVisitor.visitJumpInsn(Opcodes.IFNE,trueLabel);). If it is != 0 then it jumps to trueLabel which generates the trueStatement (ifStatement.getTrueStatement().accept(this);). Otherwise it continues to execute instructions, by generating falseStatement, and jumping (GOTO) to the endLabel.

Example

The following Enkel class:

SumCalculator {

    main(string[] args) {
        var expected = 8
        var actual = sum(3,5)

        if( actual == expected ) {
            print "test passed"
        } else {
            print "test failed"
        }
    }

    int sum (int x ,int y) {
        x+y
    }
    
}

gets compiled into following bytecode:

kuba@kuba-laptop:~/repos/Enkel-JVM-language$ javap -c  SumCalculator
public class SumCalculator {
  public static void main(java.lang.String[]);
    Code:
       0: bipush        8
       2: istore_1          //store 8 in local variable 1 (expected)
       3: bipush        3   //push 3 
       5: bipush        5   //push 5
       7: invokestatic  #10 //Call metod sum (5,3)
      10: istore_2          //store the result in variable 2 (actual)
      11: iload_2           //push the value from variable 2 (actual=8) onto the stack
      12: iload_1           //push the value from variable 1 (expected=8) onto the stack
      13: if_icmpeq     20  //compare two top values from stack (8 == 8) if false jump to label 20
      16: iconst_0          //push 0 onto the stack
      17: goto          21  //go to label 21 (skip true section)
      20: iconst_1          //label 21 (true section) -> push 1 onto the stack
      21: ifne          35  //if the value on the stack (result of comparison 8==8 != 0 jump to label 35
      24: getstatic     #16  // get static Field java/lang/System.out:Ljava/io/PrintStream;
      27: ldc           #18  // push String test failed
      29: invokevirtual #23  // call print Method "Ljava/io/PrintStream;".println:(Ljava/lang/String;)V
      32: goto          43   //jump to end (skip true section)
      35: getstatic     #16                 
      38: ldc           #25  // String test passed
      40: invokevirtual #23                 
      43: return

  public static int sum(int, int);
    Code:
       0: iload_0
       1: iload_1
       2: iadd
       3: ireturn
}

Creating JVM language [PART 9] - Returning values

Sources

The project can be cloned from github repository.
The revision described in this post is 83102b4c3f979c8d3e82abe35c91d3b14d37f1ab.

Grammar changes

I defined a new rule called “returnStatement”.

You may wonder why is it not “returnExpression”? After all an expression is something that evaluates to a value (as described in the previous post ) . Doesn’t a return statement evaluate to a value?

This may seem confusing but it turns out that return does not evaulate to a value. In Java the following code would not make sens: int x = return 5; , and same thing is with enkel. In other words expression is essentially something that can be assigned to a variable. That is why the return is a statement not an expression:

statement : variableDeclaration
           //other statements rules
           | returnStatement ;

variableDeclaration : VARIABLE name EQUALS expression;
printStatement : PRINT expression ;
returnStatement : 'return' #RETURNVOID
                | ('return')? expression #RETURNWITHVALUE;

The return statement comes in two versions:

  • RETURNVOID - used in void methods. Does not take an expression. Requires ‘return’ keyword
  • RETURNWITHVALUE - used in non-void methods. Does require an expression. Does not require ‘return’ keyword (optional).

It is possible to return value explicitly and implicitly:

SomeClass {
    fun1 {
       return  //explicitly return from void method
    }
    
    fun2 {
        //implicitly return from void method
    }
    
    int fun2 {
        return 1  //explicitly return "1" from int method
    }
    
    int fun3 {
        1  //implicitly return "1" from int method
    }
}

The above code results in following parse tree:

Parse Tree

You may notice that the parser did not resolve implicit return statement in fun2. This is due to the fact that the block is empty and matching “empty” as return statement is not a good idea. The missing return statements are added at bytecode generation phase.

Mapping antlr context objects

Parsed return statements are converted from antlr context classes into POJO ReturnStatement objects. The purpose of this step is to feed compiler only with data required for bytecode generation. Getting data from antlr generated objects to generate bytecode would result in ugly unreadable code.

public class StatementVisitor extends EnkelBaseVisitor<Statement> {

    //other stuff
    
    @Override
    public Statement visitRETURNVOID(@NotNull EnkelParser.RETURNVOIDContext ctx) {
        return new ReturnStatement(new EmptyExpression(BultInType.VOID));
    }
    
    @Override
    public Statement visitRETURNWITHVALUE(@NotNull EnkelParser.RETURNWITHVALUEContext ctx) {
        Expression expression = ctx.expression().accept(expressionVisitor); 
        return new ReturnStatement(expression);
    }   
}

Detecting implicit void return

If there is a implicit return from a void method, no return statement is detected during parse time. That is why it is necessary to detect this scenario and append return statement at generation time.

public class MethodGenerator {
    //other stuff
    private void appendReturnIfNotExists(Function function, Block block,StatementGenerator statementScopeGenrator) {
        Statement lastStatement = block.getStatements().get(block.getStatements().size() - 1);
        boolean isLastStatementReturn = lastStatement instanceof ReturnStatement;
        if(!isLastStatementReturn) {
            EmptyExpression emptyExpression = new EmptyExpression(function.getReturnType());
            ReturnStatement returnStatement = new ReturnStatement(emptyExpression);
            returnStatement.accept(statementScopeGenrator);
        }
    }
}

This method detects if the last statement in the method is a ReturnStatement. If not it generates return instruction.

Generating bytecode

public class StatementGenerator {
    //oher stuff
    public void generate(ReturnStatement returnStatement) {
        Expression expression = returnStatement.getExpression();
        Type type = expression.getType();
        expression.accept(expressionGenrator); //generate bytecode for expression itself (puts the value of expression onto the stack)
        if(type == BultInType.VOID) {
            methodVisitor.visitInsn(Opcodes.RETURN);
        } else if (type == BultInType.INT) {
            methodVisitor.visitInsn(Opcodes.IRETURN);
        }
    }
}

As an example the statements return 5 would result in following steps:

  • get expression from returnStatement (“5” - which is of type “Value” - deducted during parsing).
  • generate bytecode for “5” expression (expression.accept(expressionGenerator) calls ExpressionGenerator.generate(Value value) - visitor pattern).
  • Bytecode generation results in a new value (5) pushed onto operand stack
  • IRETURN instruction invocation takes a value from operand stack and returns it.

Which generates bytecode:

 bipush        5
 ireturn

Example

The following Enkel class:

SumCalculator {

    main(stirng[] args) {
        print sum(5,2)
    }

    int sum (int x ,int y) {
        x+y
    }
}

gets compiled into following bytecode:

kuba@kuba-laptop:~/repos/Enkel-JVM-language$ javap -c  SumCalculator
public class SumCalculator {
  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #12                 //get static field java/lang/System.out:Ljava/io/PrintStream;
       3: bipush        5
       5: bipush        2
       7: invokestatic  #16                 // call method sum (with the values on operand stack 5,2)
      10: invokevirtual #21                 // call method println (with the value on stack - the result of method sum)
      13: return                           //return

  public static int sum(int, int);
    Code:
       0: iload_0
       1: iload_1
       2: iadd
       3: ireturn //return the value from operand stack (result of iadd)
}

Creating JVM language [PART 8] - Arithmetic operations

Sources

The project can be cloned from github repository.
The revision described in this post is 1fc8131b2752e73776e91084ffeabbfa45fc6307.

Grammar changes

The basic arithmetic operations are:

  • Addition
  • Subtraction
  • Multiplication
  • Division

The only affected grammar component is “expression” rule.
Expression is “something” that evaluates to a value (functon calls, values, variable references etc.).
Statement does “something” but not necessarily evaluate to value (if blocks etc.).
Since arithmetic operations return a value they are expressions:

expression : varReference #VARREFERENCE
           | value        #VALUE
           | functionCall #FUNCALL
           |  '('expression '*' expression')' #MULTIPLY
           | expression '*' expression  #MULTIPLY
           | '(' expression '/' expression ')' #DIVIDE
           | expression '/' expression #DIVIDE
           | '(' expression '+' expression ')' #ADD
           | expression '+' expression #ADD
           | '(' expression '-' expression ')' #SUBSTRACT
           | expression '-' expression #SUBSTRACT
           ;

There are few things to clarify here.

The # notation means ‘create callback for this rule alternative’. Antlr would then generate methods like visitDIVIDE(), visitADD() in EnkelVisitor interface. It just a shortcut for creating new rules.

The rule’s alternatives order is crucial here! Let’s say we have following expression : 1+2*3. There is ambiguity because it could be parse like this 1+2=3 3*3=9 , or 2*3=6 6+1=7. Antlr resolves ambiguity by choosing the first alternative specified. Alternatives order is therefore relative to an arithmetic operations order.

Expression in grouping parenthesis are intentionally put above the regular version in the rule. This makes them higher priority. Thanks to that expressions like (1+2)*3 are parsed in correct order.

Mapping antlr context objects

Antlr generates new Classes and callbacks for each rule alternative (arithmetic expression). It is good idea to however create custom classes for each operation. This will make bytecode generation code way cleaner:

public class ExpressionVisitor extends EnkelBaseVisitor<Expression> {

    //some other methods (visitFunctionCall, visitVaraibleReference etc)
    
    @Override
    public Expression visitADD(@NotNull EnkelParser.ADDContext ctx) {
        EnkelParser.ExpressionContext leftExpression = ctx.expression(0);
        EnkelParser.ExpressionContext rightExpression = ctx.expression(1);

        Expression leftExpress = leftExpression.accept(this);
        Expression rightExpress = rightExpression.accept(this);

        return new Addition(leftExpress, rightExpress);
    }

    @Override
    public Expression visitMULTIPLY(@NotNull EnkelParser.MULTIPLYContext ctx) {
        EnkelParser.ExpressionContext leftExpression = ctx.expression(0);
        EnkelParser.ExpressionContext rightExpression = ctx.expression(1);

        Expression leftExpress = leftExpression.accept(this);
        Expression rightExpress = rightExpression.accept(this);

        return new Multiplication(leftExpress, rightExpress);
    }
    
    //Division
    
    //Substration
}

Multiplcation,Addition,Division and Substraction are just immutable POJO objects, which store left and right expressions of the operation (1+2 - 1 is left,2 is right).

Generating bytecode

Once the code is parsed and mapped into objects we can transform them into bytecode. To do that I created another Class (according to visitor pattern too) which takes an object of type Expression and generates a bytecode.

public class ExpressionGenrator {

    //other methods (generateFunctionCall, generateVariableReference etc.)

    public void generate(Addition expression) {
        evaluateArthimeticComponents(expression);
        methodVisitor.visitInsn(Opcodes.IADD);
    }

    public void generate(Substraction expression) {
        evaluateArthimeticComponents(expression);
        methodVisitor.visitInsn(Opcodes.ISUB);
    }

    public void generate(Multiplication expression) {
        evaluateArthimeticComponents(expression);
        methodVisitor.visitInsn(Opcodes.IMUL);
    }

    public void generate(Division expression) {
        evaluateArthimeticComponents(expression);
        methodVisitor.visitInsn(Opcodes.IDIV);
    }
    
    private void evaluateArthimeticComponents(ArthimeticExpression expression) {
            Expression leftExpression = expression.getLeftExpression();
            Expression rightExpression = expression.getRightExpression();
            leftExpression.accept(this);
            rightExpression.accept(this);
    }
}

The arthimetic operations using bytecodes are very straightforward. They take top two values from stack and put a result back onto it. No operands are required:

  • iadd - adds integers. Takes two values from the stack, adds them and pushes result back onto the stack
  • isub - substracts integers. Takes two values from stack, substracts them and pushes result back onto the stack
  • imul - multiplies integers. Takes two values from stack, multiplies them and pushes result back onto the stack
  • idiv - divides integers. Takes two values from stack, divides them and pushes result back onto the stack

The instructions for other types are corresponding.

Result

The following Enkel code:

First {
        void main (string[] args) {
            var result = 2+3*4
        }
}

gets compiled into following bytecode:

kuba@kuba-laptop:~/repos/Enkel-JVM-language$ javap -c First
public class First {
  public static void main(java.lang.String[]);
    Code:
       0: bipush        2 //push 2 onto the stack
       2: bipush        3 //push 3 onto the stack
       4: bipush        4 //push 4 onto the stack
       6: imul          //take two top values from the stack (3 and 4) and multiply them. Put result on stack
       7: iadd          //take two top values from stack (2 and 12-result of imul) and add em. Put result back on stack
       8: istore_1     //store top value from the stack into local variable at index 1 in local variable array of the curennt frame
       9: return
}

Creating JVM language [PART 7] - Methods

Sources

The project can be cloned from github repository.
The revision described in this post is 1fc8131b2752e73776e91084ffeabbfa45fc6307.

Methods

So far It is was possible to declare class and variables within one global scope. Next step is creating methods.

The goal is to compile following Enkel class:

First {
    void main (string[] args) {
        var x = 25
        metoda(x)
    }

    void metoda (int param) {
        print param
    }
}

Scope

To acess other functions and variables they need to be in the scope:

public class Scope {
    private List<Identifier> identifiers; //think of it as a variables for now
    private List<FunctionSignature> functionSignatures;
    private final MetaData metaData;  //currently stores only class name

    public Scope(MetaData metaData) {
        identifiers = new ArrayList<>();
        functionSignatures = new ArrayList<>();
        this.metaData = metaData;
    }

    public Scope(Scope scope) {
        metaData = scope.metaData;
        identifiers = Lists.newArrayList(scope.identifiers);
        functionSignatures = Lists.newArrayList(scope.functionSignatures);
    }
    
    //some other methods that expose data to the outside
}         

The scope object is created during class creation and passed to the children (functions). Children copy Scope (using one of the constructors) and add some other items to it.

Signatures

When calling a method there must be some kind of information about it available. Let’s say you have the following psudocode:

f1() {
    f2()
}

f2(){
}

Which results in following parse tree:

graph TD; Root-->Function:f1; Root-->Function:f2; Function:f1-->FunctionCall:f2;

Nodes are visited in following order follow:

  • Root
  • Function:f1
  • FunctionCall:f2 //ERROR! f2??! What is that? It’s not yet been declared!!
  • Function:f2

So the problem is that during method invokation the method might not yet been visited. There is no information about f2 during parsing f1!

To fix that problem it is mandatory to visit all Method Declarations and store their signatures in the scope:

public class ClassVisitor extends EnkelBaseVisitor<ClassDeclaration> {

 private Scope scope;

 @Override
 public ClassDeclaration visitClassDeclaration(@NotNull EnkelParser.ClassDeclarationContext ctx) {
     String name = ctx.className().getText();
     FunctionSignatureVisitor functionSignatureVisitor = new FunctionSignatureVisitor();
     List<EnkelParser.FunctionContext> methodsCtx = ctx.classBody().function();
     MetaData metaData = new MetaData(ctx.className().getText());
     scope = new Scope(metaData);
     //First find all signatures
     List<FunctionSignature> signatures = methodsCtx.stream()
             .map(method -> method.functionDeclaration().accept(functionSignatureVisitor))
             .peek(scope::addSignature)
             .collect(Collectors.toList());
     //Once the signatures are found start parsing methods
     List<Function> methods = methodsCtx.stream()
             .map(method -> method.accept(new FunctionVisitor(scope)))
             .collect(Collectors.toList());
     return new ClassDeclaration(name, methods);
 }
}

Invokestatic

Once all the information about the codes has been parsed It is time to convert it to bytecode. Since I haven not yet implemented object creation, methods need to be called in a static context.


int access = Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC;

The bytecode instruction for static method invokation is called invokestatic. This instruction has two parameters which require:

Values from operand stack are assumed to be parameters (amount and type must match method descriptor).



public class MethodGenerator {
    private final ClassWriter classWriter;

    public MethodGenerator(ClassWriter classWriter) {
        this.classWriter = classWriter;
    }

    public void generate(Function function) {
        Scope scope = function.getScope();
        String name = function.getName();
        String description = DescriptorFactory.getMethodDescriptor(function);
        Collection<Statement> instructions = function.getStatements();
        int access = Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC;
        MethodVisitor mv = classWriter.visitMethod(access, name, description, null, null);
        mv.visitCode();
        StatementGenerator statementScopeGenrator = new StatementGenerator(mv);
        instructions.forEach(instr -> statementScopeGenrator.generate(instr,scope));
        mv.visitInsn(Opcodes.RETURN);
        mv.visitMaxs(-1,-1); //asm autmatically calculate those but the call is required
        mv.visitEnd();
    }
}

Results

The following Enkel code:

First {
    void main (string[] args) {
        var x = 25
        metoda(x)
    }

    void metoda (int param) {
        print param
    }
}

gets compiled into following bytecode:

kuba@kuba-laptop:~/repos/Enkel-JVM-language$ javap -c First
public class First {
  public static void main(java.lang.String[]);
    Code:
       0: bipush        25 //push value 25 onto the stack
       2: istore_0         //store value from stack into variable at index 0
       3: iload_0          //load variable at index onto the stack
       5: invokestatic  #10 //call metod Method metoda:(I)V  
       8: return

  public static void metoda(int);
    Code:
       0: getstatic     #16                 // Field java/lang/System.out:Ljava/io/PrintStream;
       3: iload_0
       4: invokevirtual #20                 // Method "Ljava/io/PrintStream;".println:(I)V
       7: return
}

Creating JVM language [PART 6] - Switching to visitor oriented parsing

Sources

The project can be cloned from github repository.
The revision described in this post is 1fc8131b2752e73776e91084ffeabbfa45fc6307.

Visitor vs Listener

Previously I used listener pattern to implement Enkel parser. There is hovewer another way to do that - Visitor. To enabled it specify -visitor on the commnad line.

I was kind of curious which one would be more suitable for Enkel so I created a small project that exposes the differences. Check out this blog post where you can read full comparison and get sources.

The main benefit is that Visitor returns value where Listener does not:

  • There is less code to write.
  • Less bug prone. No need to store parsing result in the field and rely on getter.
//Listener
class ClassListener extends EnkelBaseListener<ClassDeclaration> {

        private Class parsedClass;

        @Override
        public void enterClassDeclaration(@NotNull EnkelParser.ClassDeclarationContext ctx) {
            String className = ctx.className().getText();
            //do some other stuff
            parsedClass = new Class(className,methods);
        }

        public Class getParsedClass() {
            return parsedClass;
        }
    }
//Visitor
public class ClassVisitor extends EnkelBaseVisitor<ClassDeclaration> {

    @Override
    public ClassDeclaration visitClassDeclaration(@NotNull EnkelParser.ClassDeclarationContext ctx) {
        String name = ctx.className().getText();
        //do some other stuff
        return new ClassDeclaration(name, methods);
    }
}

The decision to switch to Visitor pattern was therefore quite obvious.

Antlr 4 - Listener vs Visitor

Sources

The project source code can be cloned from https://github.com/JakubDziworski/AntlrListenerVisitorComparison. There are full examples of Listener and Visitor oriented parser implementations.

SomeLanguage

Let’s say we want to parse “SomeLanguage” with following grammar.

grammar SomeLanguage ;

classDeclaration : 'class' className '{' (method)* '}';
className : ID ;
method : methodName '{' (instruction)+ '}' ;
methodName : ID ;
instruction : ID ;

ID : [a-zA-Z0-9]+ ;
WS: [ \t\n\r]+ -> skip ;

Sample valid “SomeLanguage” code:

class SomeClass {
    fun1 {
        instruction11
        instruction12
    }
    fun2 {
        instruction21
        instruction22
    }
};

The class consist of zero or more methods. The methods consist of zero or more instructions. That’s all.

We’d like to parse the code to “Class” object:

public class Class {
    private String name;
    private Collection<Method> methods;
}

public class Method {
    private String name;
    private Collection<Instruction> instructions;
}

public class Instruction {
    private String name;
}

Listener vs Visitor

To do that Antlr4 provides two ways of traversing syntax tree:

  • Listener (default)
  • Visitor

To generate visitor classes from the grammar file you have to add -visitor option to the command line. I however use antlr maven plugin (see full code at github)

Parsing using Listener

To parse the code to Class object we could create one big Listener and register it using parser (parser.addParseListener()). This is however going to generete one huge messy class. Instead it is good idea to register separate listener for each rule separetly:

public class ListenerOrientedParser implements Parser{

    @Override
    public Class parse(String code) {
        CharStream charStream = new ANTLRInputStream(code);
        SomeLanguageLexer lexer = new SomeLanguageLexer(charStream);
        TokenStream tokens = new CommonTokenStream(lexer);
        SomeLanguageParser parser = new SomeLanguageParser(tokens);

        ClassListener classListener = new ClassListener();
        parser.classDeclaration().enterRule(classListener);
        return classListener.getParsedClass();
    }

    class ClassListener extends SomeLanguageBaseListener {

        private Class parsedClass;

        @Override
        public void enterClassDeclaration(@NotNull SomeLanguageParser.ClassDeclarationContext ctx) {
            String className = ctx.className().getText();
            MethodListener methodListener = new MethodListener();
            ctx.method().forEach(method -> method.enterRule(methodListener));
            Collection<Method> methods = methodListener.getMethods();
            parsedClass = new Class(className,methods);
        }

        public Class getParsedClass() {
            return parsedClass;
        }
    }

    class MethodListener extends SomeLanguageBaseListener {

        private Collection<Method> methods;

        public MethodListener() {
            methods = new ArrayList<>();
        }

        @Override
        public void enterMethod(@NotNull SomeLanguageParser.MethodContext ctx) {
            String methodName = ctx.methodName().getText();
            InstructionListener instructionListener = new InstructionListener();
            ctx.instruction().forEach(instruction -> instruction.enterRule(instructionListener));
            Collection<Instruction> instructions = instructionListener.getInstructions();
            methods.add(new Method(methodName, instructions));
        }

        public Collection<Method> getMethods() {
            return methods;
        }
    }

    class InstructionListener extends SomeLanguageBaseListener {

        private Collection<Instruction> instructions;

        public InstructionListener() {
            instructions = new ArrayList<>();
        }

        @Override
        public void enterInstruction(@NotNull SomeLanguageParser.InstructionContext ctx) {
            String instructionName = ctx.getText();
            instructions.add(new Instruction(instructionName));
        }

        public Collection<Instruction> getInstructions() {
            return instructions;
        }
    }
}

Parsing using Visitor

Visitor implementation is very similar but have one advantage. Visitor methods return value - no need to store values in fields.

public class VisitorOrientedParser implements Parser {

    public Class parse(String someLangSourceCode) {
        CharStream charStream = new ANTLRInputStream(someLangSourceCode);
        SomeLanguageLexer lexer = new SomeLanguageLexer(charStream);
        TokenStream tokens = new CommonTokenStream(lexer);
        SomeLanguageParser parser = new SomeLanguageParser(tokens);

        ClassVisitor classVisitor = new ClassVisitor();
        Class traverseResult = classVisitor.visit(parser.classDeclaration());
        return traverseResult;
    }

    private static class ClassVisitor extends SomeLanguageBaseVisitor<Class> {
        @Override
        public Class visitClassDeclaration(@NotNull SomeLanguageParser.ClassDeclarationContext ctx) {
            String className = ctx.className().getText();
            MethodVisitor methodVisitor = new MethodVisitor();
            List<Method> methods = ctx.method()
                    .stream()
                    .map(method -> method.accept(methodVisitor))
                    .collect(toList());
            return new Class(className, methods);
        }
    }

    private static class MethodVisitor extends SomeLanguageBaseVisitor<Method> {
        @Override
        public Method visitMethod(@NotNull SomeLanguageParser.MethodContext ctx) {
            String methodName = ctx.methodName().getText();
            InstructionVisitor instructionVisitor = new InstructionVisitor();
            List<Instruction> instructions = ctx.instruction()
                    .stream()
                    .map(instruction -> instruction.accept(instructionVisitor))
                    .collect(toList());
            return new Method(methodName, instructions);
        }
    }

    private static class InstructionVisitor extends  SomeLanguageBaseVisitor<Instruction> {

        @Override
        public Instruction visitInstruction(@NotNull SomeLanguageParser.InstructionContext ctx) {
            String instructionName = ctx.getText();
            return new Instruction(instructionName);
        }
    }
}

Results

Both implementations output the same result. I personally prefer Visitor since it requires less code and there is no need to store values in the fields.

Using any parser implementation “SomeLanguage” sample code is parsed to Class object:

{
    "name": "SomeClass",
    "methods": [
        {
            "name": "fun1",
            "instructions": [
                {
                    "name": "instruction11"
                },
                {
                    "name": "instruction12"
                }
            ]
        },
        {
            "name": "fun2",
            "instructions": [
                {
                    "name": "instruction21"
                },
                {
                    "name": "instruction22"
                }
            ]
        }
    ]
}

For full code visit https://github.com/JakubDziworski/AntlrListenerVisitorComparison.

Creating JVM language [PART 5] - Adding 'class' scope

Sources

The project can be cloned from github repository.
The revision described in this post is 50e6996a4faf8d5b469d291a029be05f9e6c9520.

Parser rules changes

In previous post I mentioned few things I would like to add to the language. The first one is going to be obviously a class scope.

Modification made to the language parsing rules:

compilationUnit : ( variable | print )* EOF;
variable : VARIABLE ID EQUALS value;

has been changed to:

compilationUnit : classDeclaration EOF ; //root rule - our code consist consist only of variables and prints (see definition below)
classDeclaration : className '{' classBody '}' ;
className : ID ;
classBody :  ( variable | print )* ;
  • The file must consist of one and only one classDeclaration.
  • class declaration consist of className followed by body inside curly brackets
  • the body is the same thing as it used to be in prototype - variable declaration or prints

This is how the parse tree looks after modifications (You can live preview parse tree using intellij plugin - https://www.youtube.com/watch?v=h60VapD1rOo):

Parse Tree

Compiler changes

Most of the changes involve moving top-level code from ByteCodeGenerator class to CompilationUnit and ClassDeclaration. The logic is as follows:

  1. Compiler grabs parse tree values from SyntaxParseTreeTraverser:
  2. Compiler instantiates CompilationUnit:

    //Compiler.java
    final CompilationUnit compilationUnit = new SyntaxTreeTraverser().getCompilationUnit(fileAbsolutePath);
    //Check getCompilationUnit() method body on github
    
  3. CompilationUnit instantiates ClassDeclaration (passes class name, and instructions list)
  4. ClassDeclaration executes class specific instructions and loops over ClassScopeInstructions:

     //ClassDeclaration.java
     MethodVisitor mv = classWriter.visitMethod(ACC_PUBLIC + ACC_STATIC, "main", "([Ljava/lang/String;)V", null, null);
     instructions.forEach(classScopeInstruction -> classScopeInstruction.apply(mv));
    

Parse Tree

One additional thing that changes is that the output .class file will have name based on the class declaration regardless of input *.enk filename:

String className = compilationUnit.getClassName();
String fileName = className + ".class";
OutputStream os = new FileOutputStream(fileName);
IOUtils.write(byteCode,os);