While just translating code from my new language to assembly is cool and useful, what really intrigues me most about compilers is optimization. Ideally, I'd like my compiler to not just produce binaries, but produce fast binaries. Additionally, I'd like to do this in an extensible way, making it easy to write new optimizations and to have them integrate well with the existing passes.
The most powerful and useful way to integrate analyses and transforms that I have seen is the technique of Lerner, Grove, and Chambers, which interleaves various transforms so that each one can benefit from the simplifications made by the rest. Much of my plan for implementing it follows the strategy taken in Hoopl. In more detail, each optimization pass is made up of two functions: an analysis function and a rewrite function. The rewrite function takes the current dataflow state and transforms an instruction into a simpler form. The analysis function just steps the dataflow state forward. To pass information between optimizations, every rewrite is run on an instruction and then al the analyses are done, making all analyses use the maximally simplified instruction.
As an example as to why this is powerful, consider the interaction between constant propagation and constant folding on the following pseudocode:
x := 5 y := x + 2 z := x + y
If constant propagation is run, then this will result in
x := 5 y := 5 + 2 z := 5 + y
which constant folding will reduce to
x := 5 y := 7 z := 5 + y
This is definitely suboptimal, as
z should have been optimized to 12. Switching the order of the two passes does not help, actually making the code worse, as constant folding never simplifies anything.
In contrast, when the two optimizations are interleaved, they produce optimal code. Before constant propagation can analyse
y := 5 + 2, constant folding rewrites it to
y := 7. Constant propagation recognises this new form, substituting to make
z := 5 + 7, which is then folded into 12.
While having a way to optimize code is important, it needs a representation of the code that is easy to analyse to be effective. A simple AST of my language would work, but would be not very amenable to optimization and would not be very general. To defer the questions of register allocation and to make optimization opportunities more obvious, I will use an SSA form representation. This has numerous advantages which I will leave out here, but they are substantial enough that most modern optimizers (GCC and LLVM are notable examples) use it.
Beyond just using an SSA form language, I am using an idea inspired by Cliff Click's graph-based intermediate representation, used in Java's HotSpot VM and in libFIRM. Instead of just having linear sequences of instructions, only side-effecting instructions are put into control flow, obviating the need to rearrange instructions carefully. This idea of separating out the pure computations is exactly what makes Haskell so highly optimized - GHC can do many transformations by knowing that order of computation does not matter.
I deviate from Click in one crucial way, however. His representation treats the world as a state machine, having some "world" object that it passed in and out of every operation that needs too modify the environment. GHC has something similar for its IO representation, using
State# RealWorld as this object. This has some unfortunate consequences, though. First of all, infinite loops are no longer well formed. These result in a circular data dependency, not tied to any exit nodes which are used to detect dead code. Secondly, there is nothing stopping some badly designed optimization from "duplicating" the world, trying to do I/O and then roll it back with another world object.
To remedy these issues, I use a continuation passing style implementation of the world. Instead of having each operation take a value from its predecessor, each operation holds a continuation pointing at its successor. This prevents any duplication of world state, as an operation can only point to one successor, and it solves the problem of infinite loops by looking at thing reachable from the start, not the end. Also, dead code analysis is made much simpler, as something is live iff it is reachable from the start node.
If statements can be just selecting between two values based on a boolean, with the control flow application involving continuations.
Doing my best with ascii diagrams, here are a few examples of my IR.
x := 13 y := 12 print(x + y)
[Start] [Const 13] | ^ V | [Print] -> [Add] -> [Const 12] | V [Return]
x := true if x: y := 5 else: y := 7 print(y)
[Start] | | >--> [Const true] V | [ If ] | | | | >--> [Const 5] V V | [ Phi ] --> [Const 7] | ^ V | [Print] | V [Return]
and is optimized to (assuming x was actually variable)
[Start] | | [Const true] | ^ | | >---> [Const 5] V | | [Print] -> [ If ] ---> [Const 7] | V [Return]
x := 0 loop forever: print(x) x := x + 1
[Start] | V-----------------------< | | >--> [Const 0] | V V | | [ Phi ]--V [Const 1] | | ^ | ^ | | | >----V | | | +------ [Add] | V | | [Print] --------------------^