Parsing infix notation
A common problem in computer science is that of parsing expressions, that is, reading and interpreting (obtaining the semantic meaning of) strings of a given formal language. This is a key part of the process of compiling or interpreting a programming language, or other languages such as those describing mathematical expressions or regular expressions. This article describes an efficient method of parsing infix notation, the syntax in which mathematical expressions are conventionally written. While mathematical expressions are used as examples, the principles extend to any language using infix operators.
Infix notation is the name given to a syntax where binary operators are placed between their two operands. In mathematical expressions, the operands may be numbers, or they may be expressions themselves; in the infix expression:
3 + 7 ÷ (4 × 5 − 6)
the operator ÷ is applied to (and lies between) the operands 7 and (4 × 5 − 6).
To parse the expression we require also that the operator precedences be defined. This is because infix notation allows an operand surrounded by two operators — such as the 5 in the example above — to be an operand of either neighbouring operator. Here, the expression could be evaluated as:
(4 × 5) − 6,     where × has a higher precidence than −, or as
4 × (5 − 6),     where − has a higher precidence than ×.
It's possible to make the order of evaluation unambiguous using brackets alone, but this can be overly verbose.
Direct evaluation
Infix expressions can be evaluated by scanning through the expression for the highest precidence operator remaining in it. The operands adjacent to this operator must be associated with it, since there is no higher precidence operator in the expression to 'steal' one of the operands. This operator and pair of operands are then replaced in the expression with the result of the operator applied to the operands, and the procedure is repeated until all operators in the expression have been evaluated. This algorithm is complicated by the presence of brackets, which are outside the framework of binary operators and operands, and are often the highest precidence operator. Brackets can be dealt with recursively by evaluating the sub-expression within the brackets as an expression in its own right, and substituting the evaluated value back into the original expression in place of the bracketed sub-expression. As an example, in the expression above:
3 + 7 ÷ (4 × 5 − 6)
the brackets are the highest precidence operator. The bracketed term:
4 × 5 − 6
is therefore evaluated first as a new expression. The highest precidence operator in this expression is ×, so this and its operands are evaluated to give:
20 − 6.
The highest precidence (and only) operator in this expression is − , whose evaluation to:
14
completes the evaluation of the bracketed term. This value is substituted into the original expression to give:
3 + 7 ÷ 14
The ÷, as the remaining operator of highest precidence, evaluates to give:
3 + ½
and the final answer: .
While this algorithm works, it is a slow process to search through the expression for the highest precidence operator at every step. When an expression may be evaluated many times (for example if it defines a user-inputted function to be plotted), another approach may be needed.
Postfix notation
It's possible to rewrite expressions in infix notation in other ways, such as in prefix notation, and in postfix notation (also known as Reverse-Polish notation (RPN)). In postfix notation, a binary operator is placed after both its operands: the expression 3 + 7 in infix notation is written 3 7 + in postfix notation. A compound expression such as (4 × 5) − 6 (in infix) is be written as 4 5 × 6 − in postfix. The expression is read from left-to-right, replacing an operator (when one occurs) with the result of that operator applied to the two previous operands: the first operator in
4 5 × 6 −
is the × operator, which is evaluated with the two previous operands, 4 and 5, to give:
20 6 −
The next operator, −, is then evaluated on 20 and 6 to give the answer: 14.
Postfix notation, while less common than infix notation in written mathematics, is much more explicit in sense that the operands belonging to a particular operator can be expressed without the need for brackets or operator precidence. Rebracketting the infix expression to 4 × (5 − 6) does not introduce brackets into the postfix form, but instead causes a change in token ordering, to 4 5 6 − ×.
This explicitness means that the algorithm for evaluating postfix notation can be made much faster than that for evaluating infix notation. A fast implemention of the postfix evaluation algorithm uses a stack: as operands are read from left-to-right, they are pushed onto the stack. When an operator is reached, its two operands are popped from the stack, the result of the operation is calculated, and the result is pushed back onto the stack. For a correctly formed postfix expression, processing all the tokens in the expression in this way will result in one number — the evaluated expression — being left on the stack.
The algorithm for evaluating an expression in postfix notation can also be used to generate an abstract syntax tree for the expression. In this case, the stack is a stack of trees, on to which single-node trees (operands) are pushed. When an operator is reached in the parsing of the postfix expression, it becomes the root of a new tree, with the two trees popped off the stack (the operands for this operator) becoming the two main 'branches'. This new tree is then pushed onto the stack as before.
The shunting-yard algorithm
Ascertaining that expressions in postfix notation can be evaluated much more quickly than those in infix notation suggests that a fast way to evaluate an infix expression might be to convert it first from infix into postfix notation, then evaluate the postfix version. Dijkstra's shunting-yard algorithm is such an algorithm for converting infix expressions to postfix expressions. The algorithm reads tokens and outputs tokens from left-to-right: the next character to be read/written immidiately follows the previous one. It is is based around a stack, which is used to store brackets, and operators which are to occur later in the output. Note that only the operators and brackets are stored on a stack for later use: the operands are output in postfix notation in exactly the same order as they occur in the infix expression. The most basic form of the shunting-yard algorithm is as follows:
Click to display
For each token in turn in the input infix expression:
  • If the token is an operand, append it to the postfix output.
  • If the token is an operator A then:
    • While there is an operator B of higher or equal precidence than A at the top of the stack, pop B off the stack and append it to the output.
    • Push A onto the stack.
  • If the token is an opening bracket, then push it onto the stack.
  • If the token is a closing bracket:
    • Pop operators off the stack and append them to the output, until the operator at the top of the stack is a opening bracket.
    • Pop the opening bracket off the stack.
When all the tokens have been read:
  • While there are still operator tokens in the stack:
    • Pop the operator on the top of the stack, and append it to the output.
A practical infix-to-postfix algorithm for mathematical expressions
The algorithm as presented above deals only with brackets and left-associative binary infix operators. While mathematical expressions mainly use this notation, non-binary operators are common in maths, and have varying conventions. Factorial (n!) is a unary postfix operator (that is, it occurs after a single argument), while arithmetic negation (, in the context of "−5" rather than "5 − 4") is a unary prefix operator. Functions (with arbitrarily many arguments) are also conventially written in prefix operators, with further brackets, commas and other symbols used to distinguish their arguments. Furthermore, while most common arithmetic operations are left-associative (that is, a − b − c = (a − b) − c), the binary operation of exponentiation is conventionally right-associative a ^ b ^ c = a ^ (b ^ c)). These conventions of syntax can be incorperated into the shunting-yard algorithm, at the expense of reduced clarity.
Click to display
For each token in turn in the input infix expression:
  • If the token is an operand, append it to the postfix output.
  • If the token is a unary postfix operator, append it to the postfix output.
  • If the token is a unary prefix operator, push it on to the stack.
  • If the token is a function token, push it on to the stack.
  • If the token is a function argument separator
    • Pop the top element off the stack and append it to the output, until the top element of the stack is an opening bracket
  • If the token is a binary operator A then:
    • If A is left-associative, while there is an operator B of higher or equal precidence than A at the top of the stack, pop B off the stack and append it to the output.
    • If A is right-associative, while there is an operator B of higher precidence than A at the top of the stack, pop B off the stack and append it to the output.
    • Push A onto the stack.
  • If the token is an opening bracket, then push it onto the stack.
  • If the token is a closing bracket:
    • Pop operators off the stack and append them to the output, until the operator at the top of the stack is a opening bracket.
    • Pop the opening bracket off the stack.
    • If the token at the top of the stack is a function token, pop it and append it to the output.
When all the tokens have been read:
  • While there are still operator tokens in the stack:
    • Pop the operator on the top of the stack, and append it to the output.
The following Java applet evaluates a mathematical expression in infix notation by converting it first to postfix notation using the shunting-yard algorithm, then evaluating the expression in postfix notation. Each step of the shunting-yard algorithm and the postfix evaluation algorithm are shown. The supported operators are +, −, × ÷, ^, parentheses and arithmetic negation, with conventional operator precidence and associativity.
The source code for the applet is available for download, under the GNU General Public License (the implementation of the shunting yard algorithm is in Parser.java).
There should be a Java applet here. If you can read this, then either you are using a browser/OS which does not support applets (e.g. most mobile phones), or you do not have an up-to-date Java browser plugin installed. See http://java.sun.com/ for the latest version.

52 Comments:
Johans Marvin Taboada Villca  -  Jul 22nd 2009, 5:14 PM
"Parsing infix notation" has an error in "Postfix notation" section just at the end of the middle paragraph, "4 * (5 - 6)" shouldn't be written in Postfix as "4 5 6 * -" but as "4 5 6 - *" as you can verify using the applet.
Article was very useful, many thanks.
Chris  -  Sep 9th 2009, 11:00 PM
Thanks for pointing that out! Now fixed.
Johans Marvin Taboada Villca  -  Sep 10th 2009, 6:03 AM
You're welcome.
Vipul Malhotra  -  Sep 23rd 2009, 5:58 PM
Your calculator saved my life!
Thanks a lot for making this out. Other calculators on the net seem to produce wrong answers. Anyways, your calculator saved me from losing marks in my computer exam.
Thanks a lot!
Chris2  -  Oct 12th 2009, 8:20 PM
How do you determine if an operator is unary - or unary +? And how exactly do you handle it when you evaluate the postfix to get the final value?

Cheers!
Chris  -  Oct 22nd 2009, 1:06 PM
I presume you mean distinguishing unary '−' from binary '−'?

My applet does this at the tokenise stage: if a '-' character is read, and the previous token read is a number or a closing parenthesis, the meaning is taken to be a binary −, whereas if the previous token is an operator (or if there is no previous token), the meaning is taken to be unary −. When evaluating the expression, a unary − token pops one number off the token stack, multiplies it by -1, and pushes it back on the stack.
Paulo  -  Nov 11th 2009, 5:23 PM
Excellent ...helped me so much.... I am writing a compiler..... Thanks
Sara  -  Nov 16th 2009, 5:11 PM
the calculator does not work with expresions as this:
)78*1
(1+3))
Sean  -  Nov 19th 2009, 8:12 PM
Thanks for the great help! My professor recommended using this site to make sure our postfix was right.

Great tool.
CompSci Student  -  Dec 23rd 2009, 12:34 AM
)78*1
(1+3))

have mismatching parentheses, so will not work. It just does not filter these inputs out.
ARUN  -  Apr 20th 2010, 10:12 AM
Tihs applet is very useful in understanding polish noataion
Thank you Sir
cincoutprabu  -  May 17th 2010, 6:15 AM
Hi chris, no variables and functions are allowed in the input. You can enable it so that everybody will use this as a validation tool for infix-to-postfix conversion. However, you can turn off evaluation if variables and functions are used in the input expression.
Chris  -  May 29th 2010, 11:29 PM
cincoutprabu: Good suggestion. I'll think about it, but it's not quite trivial to implement. Specifically, deciding whether an arbitrary string is a variable or a function in the tokenise step requires context, as with the binary/unary minus earlier in the comments. Obviously in a real parser, symbol tables would take care of this.
cincoutprabu  -  Jul 11th 2010, 6:25 PM
Hi chris, I have posted a silverlight widget for live demo similar to the applet in this page. It can handle variables also. Will be useful, for other beginners and students !!
cincoutprabu  -  Jul 11th 2010, 6:27 PM
Hi chris, I have posted a silverlight widget for live demo similar to the applet in this page. It can handle variables also. Will be useful, for other beginners and students !! Widget is available here: http://www.codeding.com/?article=11
Chris  -  Jul 19th 2010, 8:48 PM
cincoutprabu: hey, that's cool
kyro  -  Jul 20th 2010, 1:36 PM
thanks alot, this have helped alot
Charlie  -  Aug 11th 2010, 2:39 PM
In the case of 4 × (5 − 6) => 4 5 6 − × for example, is it truly a stack if you have to pop and buffer or skip the 4 to evaluate 5 6 - before evaluating the x? You've got associated tokens at opposite ends of the stack...
Chris  -  Aug 13th 2010, 11:12 PM
Charlie: The in the evaluation of the postfix expression, the stack isn't initially filled with the postfix expression - it's initially empty. In your example, the 4, then 5, then 6 are pushed onto the token stack in turn, then (in response to the -) the 5 and 6 are popped and -1 pushed, and (in response to the *) the -1 and 4 are popped and -4 pushed. The column headers in the java applet indicate what is a stack.
James  -  Nov 4th 2010, 3:57 PM
if we input -(3+2)-1 this algorithm answer 6.0
truthful answer is -6.0 ! ! !
Chris  -  Nov 10th 2010, 8:43 AM
James: paste your expression in: it gives -6!
Bri  -  Nov 16th 2010, 7:41 PM
Can the Shunting Yard algorithm be extended to handle ternary operators, such as conditional operators? In C, the syntax for a conditional would be something like this:

a ? b : c
Chris  -  Nov 18th 2010, 6:42 PM
Bri: Yes, I think it should be possible, but it may be quite a major extension. The C conditional only ever executes one of "b" or "c", which will not be the case if the infix/ternary operators are just converted to postfix at compile time and the postfix expression evaluated at run time.
Bri  -  Nov 19th 2010, 9:09 PM
Why can't it just be converted to postfix at compile time and then evaluated at run-time? It should operate like a function:

conditional(a, b, c) {
if (a == 0) then return c;
else return b;
}

You can then get it to work in the parser within an expression:

3 + cond(1, 4, 5)

(would evalute to 7 in this case)

I was able to get it to work as a function, but it would be nice to be able to build it into the parser. Wouldn't the above be equivalent to:

3 + (1 ? 4 : 5)

Or, in postfix:

3145?+

francis  -  Nov 21st 2010, 12:02 AM
i mean that shuting-yard get a numbers with the sign implicit in them.

because :
IN: 2*(-1)
OUT: 2, 1, -, *,

you disting between unary minus and binary minus.!!
rols  -  Nov 21st 2010, 12:42 PM
Thanks for this - it's a great reduction of the infix parsing algorithm and saved me a whole load of time working it out.

One comment. You presume that a postfix operator is always higher precedence than what's on stack. In reality that's probably true but if I close my eyes and think really hard I could imagine a world in which something like x ^ y! wanted the power to come before the factorial. You could tweak your algorithm for unary postfix operator to say that if you get one, instead of appending it directly to the output, you first pop off anything on the stack which is of higher precedence, then you append it to the output.

If you did that, in the very constructed example I have, instead of

x ^ y! --> x y ! ^

it would be

x ^ y! --> x y ^ !

I don't know if this matters in the real world ... there may be a case where you do want a postfix operator not to be all-powerful

Chris  -  Nov 22nd 2010, 11:21 AM
Bri: That's not quite true in C, since if one executes
conditional(a,++b,++c)
both b and c are incremented, whereas with
a?(++b):(++c)
only one of b or c is incremented, depending on the value of a.

However, if one doesn't have operators like ++ which alter the state of the variables, the functional form is equivalent as you say.

I can't see a way to add it to the parser without some slightly messy code. One way would be to treat ? as a unary postfix operator in the infix->postfix conversion step, and as a three-arg function during postfix evaluation, and to treat : as a binary operator in infix->postfix conversion, but remove it from the postfix queue before postfix evaluation. (but see rols's comment above about postfix precedence)
Chris  -  Nov 22nd 2010, 11:37 AM
rols: Thanks! I noticed this myself a couple of days ago when thinking about Bri's comment, but didn't have time to think about a solution. I will add yours to the page.

I'm sure there are instances (perhaps outside of mathematical notation) where one might want a unary postfix operator to have a low precedence.
Bri  -  Nov 22nd 2010, 2:36 PM
You have to look at both associativity and precedence to see whether an operator on the stack gets popped to the output before the current operator is pushed, according to the following rules (assume o1 is the current operator, and o2 is an operator at the top of the stack):

* while there is an operator token, o2, at the top of the stack, and
either o1 is left-associative and its precedence is less than or equal to that of o2,
or o1 is right-associative and its precedence is less than that of o2,
pop o2 off the stack, onto the output queue;

* push o1 onto the stack.

Chris, I don't have any ++ operators or anything like that in my parser so I'm not concerned about that. It would be fine if it acted like the function form. That said, I assume you can add ++ to a parameter of a function as well, so I imagine it acts similar to a function either way doesn't it?

cond(a,++b,++c)

What happens in this case in C?

Chris  -  Nov 22nd 2010, 10:13 PM
Bri: in C

int a=1, b=1, c=1;
a?++b:++c; // leaves a=1, b=2, c=1

whereas:

int a=1, b=1, c=1;
cond(a,++b,++c); // leaves a=1, b=2, c=2
Calmarius  -  Nov 24th 2010, 7:01 PM
In order to evaluate expressions containing the conditional operator we need to add THEN,ELSE and END points to the resulting postfix form.

Infix to postfix conversion:

Several modifications needed:

- Add a "conditional operator counter" to algorithm it will be 0 at start.
- Treat '?' as a right associative low precedence operator (multiple ?-s can stack on each other)
- Treat ':' as a left associative with the same precedence as '?'
- ':' operator will stack on '?'
- The ':?' pair can be popped from the stack only together.
- When a lower precedence operator trying to pop the '?' operator alone issue a parse error (': expected').

Now about the conditional operator counter:

- When you push a '?' on the stack increase the counter and put a THEN(counter) token to the output.
- When you push a ':' on the stack put and ELSE(counter) token to the output.
- When you pop a ':?' pair from the stack put and END(counter) token to the output and decrease the coutner.
- If COC is nonzero when the parsing finishes issue a parse error.

Evaluating the postfix form.

- When encountering a THEN(x) token...
- if there is true (nonzero) on the operand stack continue evaluating
- if there is false (zero) on the operand stack skip after the next ELSE(x) token and continue evaluation.
- When you encounter and ELSE(x) token (you came from the THEN branch) skip after the next END(x) token and continue evaluation.
- When you encounter an END(x) token ignore it.

Now let's try it. Let's see the following expression:

a<b ? x++ : y++

Start parsing:

INPUT:a<b ? x++ : y++
OUTPUT:
STACK:
COC:0

You encounter the first ? here, stack already popped:

INPUT: ? x++ : y++
OUTPUT:a b <
STACK:
COC:0

So, increase the COC, add a THEN(COC) token (for short '?COC') then push the '?' on the stack.

INPUT: x++ : y++
OUTPUT:a b < ?1
STACK: ?
COC:1

Parsing continues as usual then you encounter the first ':':

INPUT: : y++
OUTPUT:a b < ?1 x++
STACK: ?
COC:1

At this point add an ELSE(COC) (':COC' for short) token and stack the ':' on the '?'.

INPUT: y++
OUTPUT:a b < ?1 x++ :1
STACK: :?
COC:1

After we exhausted the input we need to pop the ':?':

INPUT:
OUTPUT:a b < ?1 x++ :1 y++
STACK: :?
COC:1

Add an END(COC) ('eCOC' for short) then decrease the COC.

INPUT:
OUTPUT:a b < ?1 x++ :1 y++ e1
STACK:
COC:0

Now let's play with a more complex expression a nested conditional:

a<b ? c<d ? x : y : e<f ? z : w

Conditional operator is right associative so we should parse it as: a<b ? (c<d ? x : y) : (e<f ? z : w)

So start parsing now:

INPUT: a<b ? c<d ? x : y : e<f ? z : w
OUTPUT:
STACK:
COC: 0

After encountering the first '?':

INPUT: c<d ? x : y : e<f ? z : w
OUTPUT:a b < ?1
STACK: ?
COC: 1

After encountering the second '?':

INPUT: x : y : e<f ? z : w
OUTPUT:a b < ?1 c d < ?2
STACK: ??
COC: 2

After encountering the first ':':

INPUT: y : e<f ? z : w
OUTPUT:a b < ?1 c d < ?2 x :2
STACK: :??
COC: 2

After encountering the second ':'. Being left associative it will pop the ':?' out first (triggering its effect too):

INPUT: e<f ? z : w
OUTPUT:a b < ?1 c d < ?2 x :2 y e2 :1
STACK: :?
COC: 1

After encountering the third '?' (being right associative it will stack on the ':?'):

INPUT: z : w
OUTPUT:a b < ?1 c d < ?2 x :2 y e2 :1 e f < ?2
STACK: ?:?
COC: 2

Then the last colon:

INPUT: w
OUTPUT:a b < ?1 c d < ?2 x :2 y e2 :1 e f < ?2 z :2
STACK: :?:?
COC: 2

And the final clean up:

INPUT:
OUTPUT:a b < ?1 c d < ?2 x :2 y e2 :1 e f < ?2 z :2 w e2 e1
STACK:
COC: 0

Those THEN tokens can be easily compiled to JNZ instructions. ELSE tokens can be converted to JMP (to end) instructions in assembly by a compiler. END tokens are removed they are for the sake to know where is the end.
castamir  -  Nov 26th 2010, 4:55 PM
input:
+5(5*6)

outup:
35

well... +5(5*6) is not a correct input but your applet count it instead any error msg
Chris  -  Nov 27th 2010, 12:25 PM
That's correct - the shunting yard algorithm converts valid infix to valid postfix, but doesn't determine if the input is valid infix.
Nieko Borela  -  Dec 27th 2010, 12:28 PM
sir can this solve the eq:
2(5)?
John  -  Apr 20th 2011, 11:54 AM
Very good applet. Thanks for your help. I am currently writing a compiler and used this article (especially the shunting yard algorithm) in my program.
Stuart  -  May 3rd 2011, 4:35 AM
Great stuff, very illuminating!! Thank you.
"A practical infix-to-prefix algorithm..." heading has an accidental error, should read...to-postfix ?
Chris  -  May 7th 2011, 12:49 AM
Thanks Stuart - now fixed.
Stuart  -  Jun 11th 2011, 6:17 AM
Have I got this right ..1 / 2 * 4 should evaluate to 1 / 8 right ?
but the applet uses half of four, ie answers 2 ?
Stuart  -  Jun 11th 2011, 6:18 AM
Have I got this right ..1 / 2 * 4 should evaluate to 1 / 8 right ?
but the applet uses half of four, ie answers 2 ?
Stuart  -  Jun 12th 2011, 5:00 AM
arrgh! :) was mistaken sorry..the curse of BOMDAS
Josh  -  Jul 17th 2011, 12:57 PM
This was extremely helpful. Thanks so much for taking the time to write this tutorial and the applet.
Rod  -  Sep 21st 2011, 9:45 AM
Really helpful, thanks! But how do you do the tokenisation proccess? Thanks.
Mark  -  Oct 19th 2011, 3:43 PM
This is awesome! Excellent work... definitely bookmarking this page for future reference.
Melike  -  Nov 30th 2011, 1:01 PM
It's great, thank you for your effort !
CClauson  -  Jan 20th 2012, 6:22 AM
Thanks for this article.

Looking at the algorithm with prefix/postfix, logically it seems like there's a problem. If unary postfix operators are appended directly to the output stream, then their precedence is never looked at. I believe that with the algorithm as described here, postfix operators will always have the highest precedence of all operators.

Thanks
Teddy  -  Feb 5th 2012, 10:24 PM
Great job, tho a bit more polishing needs to be done.
Try -(3*(4+2)
While shouln't work as is wrong, it actually works :)
Chris  -  Feb 24th 2012, 12:42 PM
CClauson - good point. I'll look into this.

Teddy - OK, but it does report the error earlier in the program...
stuart  -  Jun 3rd 2012, 2:29 PM
I have put a freebie small pic-based interpreter at www.hydrazoic.info, it uses the shunting yard algorithm and unary prefix operators for -, sin,cos...seems to work ok so far. :)
censor  -  Sep 22nd 2012, 12:18 AM
No detect this error:
example: 45(*5+2)
Solo  -  Nov 13th 2012, 3:44 PM
How can I implement sin, cos, tan?
Alex  -  Feb 24th 2013, 4:41 PM
Great article, helped me a lot.Thanks.
Tristan  -  May 24th 2014, 4:15 PM
3+4×2÷(1-5)^2^3 produces a different result to the online calculators I have tried. The applet returns 3.000122
The other calculators seem to convert the infix to 3+4×2÷((1-5)^2)^3 = 3.0019531
Is shunting-yard incorrectly converting the infix or are the other calculators wrong?
Example
Try entering 3+4*2/(1-5)^2^3 in http://web2.0calc.com/
Add a comment:
Name:
E-mail (optional, will not be published):
Comment:
Anti-spam measure: