ATS supports user defined symbols with custom fixity and precedence. The `patopt`

compiler will do a dedicated pass to resolve fixity before all other passes.

I tried to build a parser that supports the samething using Antlr. After several tries, I finally did it and I would like to share it here.

## Precedence Climbing Parsing

Antlr is using this technology internally [antlrbook]. You can read the details here at the blog, [Norvell]. This guy extends the original algorithm to support postfix and non-assoc operators. He also gives it a name, "precedence climbing". The blog is a must read.

The idea is simple and elegant. Usually, a **traditional** grammar for expression is to create a new nonterminal for each level of precedence as follows

```
atom: NUMBER | '(' expr ')' | '-' atom;
mult: atom (('*' | '/') atom)*;
expr: mult (('+' | '-') mult)*;
```

But this is like a hard coded precedence and fixity. In a precedence climbing way, `expr`

will be parameterized by precedence as `expr[p]`

, which contains no operators whose precedence are below `p`

.

Therefore

```
binaryOp: ...
unaryOp: ...
atom
: NUMBER
| '(' expr[0] ')'
| unaryOp expr[q]
;
expr[int p]
: atom (binaryOp expr[q])*
;
```

The value `q`

depends on the previous operator's precedence `prio(op)`

and fixity `fix(op)`

.

- if
`fix(op) = infix left-assoc`

then`q = prio(op)+1`

- if
`fix(op) = infix right-assoc`

then`q = prio(op)`

- if
`fix(op) = prefix`

(assuming unary) then`q = prio(op)`

- if
`fix(op) = postfix`

(assuming unary) then`q = N/A`

- if
`fix(op) = infix non-assoc`

then`q = prio(op) + 1`

(The latter two is not covered in the above example yet.)

In this case, parsing `1+2*3-4`

will result in such a trace

```
expr[0]: 1 + expr[a]
expr[a]: 2 * expr[b]
expr[b]: 3, return 3 because '-' is of lower precedence than 'b'
return (2*3) because '-' is of lower precedence than 'a'
expr[0]: 1 + (2*3) - expr[c]
expr[c]: 4, return 4 because of EOF token
return 1 + (2*3) - 4
```

That's saying, a precedence test failure will result in returning the matched part immediatly (like `expr[a]`

and `expr[b]`

)and continue to the next iteration of the loop (like `expr[0]`

).

That's it, really an elegant way of parsing expressions.

## Semantic Predicates in Antlr

For this part, I recommend this blog, Stuff, and this stackoverflow Q and A. Altought they are about Antlr 3, but the rational behind it still holds.

From my understanding, predicat is a tool for you to stop the current matching attempt. But depending on where you put it, you will have different recovery behaviour. Also, predicate can only reference tokens before it (using labels), or you have to call `lookahead`

methods to get what comes after it.

From my exprience, **don't put predicates in the middle of a rule** (or please tell me what's the semantic of such predicates, I'm not clear of that). Please either put it as the beginning of an alternative (including embeded sub-alternative, and loop), or as the end of an loop.

- When you put it at the beginning, the predicats will help parser to choose among alternatives.
- When put it at the beginning of a loop, it helps decide whether to go into or stop looping.
- When you put it at the end of the loop, it will decide when to stop the loop, and return whatever it already parsed.

## Allowing User-defined Operators

The code speaks itself.

The first predicate

`{(fmap.get(_input.LT(1).getText()).contains("infix")) && pmap.get(_input.LT(1).getText()) >= $p}?`

ensures that only infix operators enter this rule. Postfix operators will enter the next rule with this predicates

`{(fmap.get(_input.LT(1).getText()).equals("postfix")) && pmap.get(_input.LT(1).getText()) >= $p}?`

As defined, `expr[p]`

only matches an expression where all operators inside it are of equal or higher precedence. So we have `>= $p`

.

Note that since our predecates are located before the token, we need to lookahead using `_input.LT(1)`

.

`{!fmap.get($op.text).equals("infix")}? `

This ensures that an non-assoc operator will not chain into something like

`a = b = c //sytax error`

For the `aexpr`

,

`{fmap.get(_input.LT(1).getText()).equals("prefix")}?`

ensures that only prefix operators enter this rule.

Finally, `expr[nextp($op.text)]`

will recursively enter `expr[]`

rule with a computed *next precedence* according to our definitino above.

## Result

The parser will generate the following tree for the code below

```
prefix 6 -;
infixr 6 ^;
infixl 3 + ;
infixl 5 * /;
infix 10 =;
postfix 7 !;
1 ! ! + - 2 * 3 ^ 5 ^ 6 + - - 4 ! = 5 = 6;
```

Please note that I assign precedence 10 to `=`

because I would like to show you it is non-assoc.

Comments are welcomed.