TechHui

Hawaiʻi's Technology and New Media Community

Hacking BeanShell with JavaCC - Building a Language

A few months ago Clifton Royston posted an excellent entry titled Recursive descent parsing with BNF grammars. This inspired me to write about JavaCC, my favorite parser generator. Today I finally got around to putting together some examples.

Academic articles on parser generators and meta-compilers can be pretty dry stuff. To keep this interesting, we are going to implement some new language features on top of the popular BeanShell scripting language. BeanShell is basically an optionally typed superset of Java with some scripty features added for convenience. Its a lot easier to start with a working language and hack it than to build a new language from scratch. This approach will also give us instant gratification as we see our new language features in action using the interactive interpreter. Because this is an involved subject, I am going to cover it in a series of posts. In today's post we will add a few simple but useful features. Our extended version of BeanShell will be called BX.

The first language feature we are going to add is the ability to use array-like indexers on lists and Strings. Example:

    List list = new ArrayList();
list.add("cat");
list.add("dog");
print(list[1]); // output: "dog"
print("hello"[0]); // output: 'h'

This is relatively easy because it doesn't require any changes to the grammar. To implement this feature we need to crack open bsh.Reflect, a utility class used heavily in BeanShell. A few additions to the getIndex and setIndex methods will do the trick.

Reflect.getIndex before:
    public static Object getIndex(Object array, int index)
throws ReflectError, UtilTargetError {

try {
Object val = Array.get(array, index);
return Primitive.wrap( val, array.getClass().getComponentType() );
} catch( ArrayIndexOutOfBoundsException e1 ) {
throw new UtilTargetError( e1 );
} catch(Exception e) {
throw new ReflectError("Array access:" + e);
}
}

Reflect.getIndex after:
    public static Object getIndex(Object indexable, int index)
throws ReflectError, UtilTargetError {

try {

if(indexable.getClass().isArray()) {
Object val = Array.get(indexable, index);
return Primitive.wrap( val, indexable.getClass().getComponentType() );
} else if(indexable instanceof List) {
return ((List)indexable).get(index);
} else if(indexable instanceof CharSequence) {
return new Primitive(((CharSequence)indexable).charAt(index));
}

throw new ReflectError(indexable.getClass().getName() + " is " +
"not indexable");
} catch( ArrayIndexOutOfBoundsException e1 ) {
throw new UtilTargetError( e1 );
} catch(Exception e) {
throw new ReflectError("Array access:" + e);
}
}

This will take care of indexable element acess (ex. myArray[1]). To handle indexable element assignment we need to make similar changes to Reflect.setIndex. Take a look at the attached source for details. Now we compile the source using ant with the "compile" target. You can run the interactive interpreter by executing bsh.Interpreter for the console version or bsh.Console for the GUI version. Try creating a few lists and accessing the members using the array syntax. Voilà! Language hacking has never been so easy :-)

Examples:
    print([1,2,3]);
print([[1,2],[3,4]]);
print(["dan","mika"][1]);
print(['a','b','c'].size());

For our next trick we will create a new type of String literal. Our new literal will be surrounded by back quotes. Like C#'s literal string, it will be able to handle any character, including new lines, and will be terminated only by a second back quote.

Examples:
    String story = `She said "Hello" and smiled`;
String multiLine = `This is a very long string
that keeps going and going...`;

This change will require a minor addition to the grammar. The BeanShell grammar is defined in a JJTree file called bsh.jjt. You will find it in the src directory. JJTree is a a tree building preprocessor for the JavaCC (Java Compiler Compiler) parser generator. The JJTree language is a superset of Java that includes pattern recognizing features. The process we will follow is:

1. Add a definition for the new literal type to bsh.jjt.
2. Run JJTree on bsh.jjt to produce bsh.jj
3. Run JavaCC on bsh.jj to produce Parser.java

This can all be accomplished using the "compile" ant target. Our addition to the grammar will occur in the TOKEN section of the grammar defined in bsh.jjt:

Before:
  < STRING_LITERAL:
"\""
( (~["\"","\\","\n","\r"])
| ("\\"
( ["n","t","b","r","f","\\","'","\""]
| ["0"-"7"] ( ["0"-"7"] )?
| ["0"-"3"] ["0"-"7"] ["0"-"7"]
)
)
)*
"\""
>

After:
  < STRING_LITERAL:
(
"\""
( (~["\"","\\","\n","\r"])
| ("\\"
( ["n","t","b","r","f","\\","'","\""]
| ["0"-"7"] ( ["0"-"7"] )?
| ["0"-"3"] ["0"-"7"] ["0"-"7"]
)
)
)*
"\""
)
|
( "`" ( ~["`"] )* "`" )
>

The new pattern is ( "`" ( ~["`"] )* "`" ), which basically means a back quote followed by zero or more characters that are not back quotes, terminated by a second back quote. Our work on this feature is complete. Run the "compile" ant target and give it a spin.

Most scripting languages provide an easy way to create and populate lists and maps. Our next feature is a list expression. I will leave maps to a later post, or as an exercise for the reader.

Example List Expression:
    var nums = [1,2,3,4]; // could also be "List nums = [1,2,3,4];"

To implement this feature we need to know a little more about JJTree. JJTree creates a tree node for each nonterminal in our grammar. We need to define the pattern for a list expression node, insert it in the right place in the grammar, and create the ListExpression class that will actually instantiate the list.

Node Definition (definied in bsh.jjt):
    void ListExpression() #ListExpression :
{ }
{
"[" [ Expression() ( LOOKAHEAD(2) "," Expression() )* ] "]"
}

#ListExpression tells JJTree to insert code to instantiate our custom node class called BSHListExpression. BSHListExpression is a regular java file:
class BSHListExpression extends SimpleNode {

BSHListExpression(int id) {
super(id);
}

public Object eval(CallStack callstack, Interpreter interpreter)
throws EvalError {

ArrayList list = new ArrayList();

int nodeCount = jjtGetNumChildren();
for (int i = 0; i < nodeCount; i++) {
SimpleNode node = ((SimpleNode) jjtGetChild(i));
Object value = node.eval(callstack, interpreter);
list.add(value);
}

return list;
}
}

As you can see, the SimpleNode superclass gives us access to the child expressions we parent. We access child nodes with jjtGetChild(i), eval them with node.eval(callstack, interpreter), and add them to the list. The following excerpt from the JJTree docs expounds on the tree building process:

"Although JavaCC is a top-down parser, JJTree constructs the parse tree from the bottom up. To do this it uses a stack where it pushes nodes after they have been created. When it finds a parent for them, it pops the children from the stack and adds them to the parent, and finally pushes the new parent node itself. The stack is open, which means that you have access to it from within grammar actions: you can push, pop and otherwise manipulate its contents however you feel appropriate."

The final step is to add our ListExpression node to the appropriate place in the grammar. In this case the proper place is in the PrimaryPrefix node along with literals and method invocations:
    void PrimaryPrefix() : { }
{
Literal()
|
"(" Expression() ")"
|
AllocationExpression()
|
LOOKAHEAD( MethodInvocation() )
MethodInvocation()
|
LOOKAHEAD( Type() "." "class" )
Type()
|
AmbiguousName()
|
ListExpression()
}

That's it! Run the compile ant target and fire up the interpreter. A few things to try:
    print([1,2,3]);
print([[1,2],[3,4]]);
print(["dan","mika"][1]);
print(['a','b','c'].size());

The attached source includes a few other features including a <> equivalence operator that calls .equals on its operands and handles nulls on either side. The process of implementing operators is very similar to the process of implementing the ListExpression above. Take a look at the source and let me know if you have any questions.

For more information please visit the JavaCC project page.

At some point I will write a follow up post that delves deeper into JJTree's features and covers the implementation of more advanced language features such as named arguments and extender methods.

bx-.1beta-src.zip
bx-.1beta.jar


Ikayzo - Design • Build • Localize | Web • Desktop • Mobile

Views: 753

Tags: JJTree, Java, JavaCC, parsers

Comment

You need to be a member of TechHui to add comments!

Join TechHui

Comment by Clifton Royston on March 31, 2008 at 8:20am
Neat article, and thanks for the name-drop! -- C

Sponsors

web design, web development, localization

© 2014   Created by Daniel Leuck.

Badges  |  Report an Issue  |  Terms of Service