Grammar-Kit
Grammar-Kit is a JetBrains plugin for generating a parser (and a lexer, if needed) from a grammar specification file that uses a BNF-like syntax. Not only does it generate the parser, but it also generates the PSI classes of the language. This is a huge time-saver, as it allows us to focus on the language syntax and semantics rather than the parsing details.
The BNF File
The grammar specification file is written in a BNF-like syntax. The file two types of sections: an attributes section and a grammar rules section. The attributes section allows us to customize the generated parser and PSI classes, while the grammar rules section defines the language syntax. Here's a simple example of such a BNF file:
// Attributes
{
parserClass="generated.MyParser"
}
// Grammar
Root ::= BEGIN (MyStmt SEMI)* END
MyStmt ::= PrintStmt
| ...
PrintStmt ::= PRINT STRING_LIT
In the attributes section we specified the fully qualified name of the parser class that will be generated. In the grammar section, we defined the syntax of a simple language that consists of a BEGIN
keyword followed by a sequence of statements separated by semicolons (the SEMI
token), and ending with an END
keyword. The PrintStmt
rule defines a statement that consists of a PRINT
keyword followed by a string literal.
There are no restrictions on the names of the rules or the tokens, but here I'm adopting the following convention:
- Rule names are in
PascalCase
(those are intermediate nodes in the AST) - Token names are in
UPPER_SNAKE_CASE
(those are leaf nodes in the AST)
Generating the Parser
To generate the parser, we can right-click on the grammar file and select the Generate Parser Code action. This will generate the parser and lexer classes under the src/main/gen
directory (as opposed to the src/main/kotlin
directory where we have our handwritten code). By default, Grammar-Kit generates the following set of classes:
src/main/gen
└── generated
├── psi
│ ├── impl
│ │ ├── MyStmtImpl.java
│ │ └── PrintStmtImpl.java
│ ├── MyStmt.java
│ ├── PrintStmt.java
│ └── Visitor.java
├── MyParser.java
└── GeneratedTypes.java
For PSI, Grammar-Kit generates pairs of interfaces and implementation classes for each rule in the grammar. For example, the PrintStmt
rule will have an interface PrintStmt
and an implementation class PrintStmtImpl
. The Visitor
interface is used to traverse the PSI tree, and the GeneratedTypes
class contains the token and rule element types (extending IElementType
). It also contains an inner Factory
class with a single method createElement
that creates the appropriate PSI element for a given AST node (used by the ParserDefinition.createElement()
method to create PSI elements).
We can also ask Grammar-Kit to generate a lexer for us by right-clicking on the grammar file and selecting the Generate JFlex Lexer action. This will generate the lexer specification file, but we have to select the location where we want to save it - let's say we save it under src/main/gen/generated
as MyLexer.flex
(alongside the generated parser and PSI). Now we can right-click on the lexer file and select the Run JFlex Generator action, which will generate the lexer class MyLexer
under the src/main/gen/generated
directory. So, in addition to the above files, we will have:
src/main/gen
└── generated
├── ...
├── MyLexer.flex
└── MyLexer.java
While generating the lexer is convenient, it is not flexible enough for complex languages that require custom lexing rules (like Nim). So, we will keep our handwritten lexer and tokens, and tell Grammar-Kit to use them instead.
We also won't need a visitor class for now, so we'll tell Grammar-Kit not to generate it as well.
A Simple Nim BNF
Let's create a BNF file for Nim that would parse the echo "hello, world"
statement that we used in the previous sections.
{
generate=[tokens="no" visitor="no"]
parserClass="khaledh.nimjet.parser.NimParser"
parserImports="static khaledh.nimjet.lexer.NimToken.*"
elementTypeClass="khaledh.nimjet.parser.NimElementType"
elementTypeHolderClass="khaledh.nimjet.parser.NimElement"
psiPackage="khaledh.nimjet.psi"
psiImplPackage="khaledh.nimjet.psi.impl"
}
Root ::= !<<eof>> NimStmt
NimStmt ::= IDENTIFIER STRING_LIT
In the attributes section, we specified:
- that we don't want Grammar-Kit to generate tokens or a visitor class
- the fully qualified name of the
NimParser
class that will be generated - the parser imports that we need, which are the token types from our lexer
- the fully qualified name of the element type class
NimElementType
that we created previously; Grammar-Kit will use this class to create instances of the AST element types - the class that will hold the instances of the element types:
NimElement
- the package names for the PSI and PSI implementation classes
In the grammar section, we defined the syntax of a simple Nim file that consists of a single statement, which is an identifier followed by a string literal.
Notice the !<<eof>>
syntax in the Root
rule. The <<...>>
syntax is used to invoke an external rule defined in parser, which in this case is the built-in eof
rule that matches the end of the file. The !
operator negates the rule, so having !<<eof>>
at the beginning of the Root
rule won't match an empty file. This prevents the parser from generating an error when the file is empty.
Generating the Nim Parser
Now we can right-click on the Nim BNF file and select the Generate Parser Code action to generate the parser classes. We will end up with the following set of classes:
src/main/gen
└── khaledh/nimjet
├── parser
│ ├── NimElement.java
│ └── NimParser.java
└── psi
├── impl
│ └── NimStmtImpl.java
└── NimStmt.java
Our handwritten part of the parser should be structured as follows, under the src/main/kotlin
directory:
src/main/kotlin
└── khaledh/nimjet
...
├── parser
│ ├── Nim.bnf
│ ├── NimElementType.kt
│ └── NimParserDefinition.kt
└── psi
└── NimFile.kt
And to complete the organization, we should have the lexer and the various language plugin classes also tidied up as follows:
src/main/kotlin
└── khaledh/nimjet
├── lang
│ ├── NimFileType.kt
│ ├── NimIcons.kt
│ └── NimLanguage.kt
├── lexer
│ ├── Nim.flex
│ ├── NimLexerAdapter.java
│ ├── NimToken.kt
│ └── NimTokenType.kt
├── parser
│ ...
└── psi
...
If we test the plugin now, we should have the same functionality as before, but with the parser generated by Grammar-Kit in this case. As a final step, let's automate generating the parser using the Gradle build script, as we did with the lexer.
// build.gradle.kts
...
tasks {
generateLexer {
...
}
generateParser {
sourceFile = file("src/main/kotlin/khaledh/nimjet/parser/Nim.bnf")
targetRootOutputDir = file("src/main/gen")
pathToParser = "khaledh/nimjet/parser/NimParser.java"
pathToPsiRoot = "khaledh/nimjet/psi"
purgeOldFiles = true
}
compileJava {
dependsOn(generateLexer)
dependsOn(generateParser)
}
compileKotlin {
dependsOn(generateLexer)
dependsOn(generateParser)
}
}
This should take care of generating the parser whenever we build the project, so we don't have to manually run the Grammar-Kit action every time we change the BNF file (unless we need to inspect the generated code, of course).