Wednesday, December 17, 2014

Backpropagation, memory, and other optimization troubles

In my previous post, I described an intermediate representation that made a number of optimization problems much easier. However, it still has some complications.

Memory

Naively, I could just represent memory operations as side effects, modifying the world. However, this is suboptimal for exactly the reasons that I chose to keep operations like addition out of the normal flow - it hinders optimization. As such, I need some other representation of memory. libFIRM takes the approach of having a global "Store" that it passes in and out of every write and into every read. This has a number of large advantages - it naturally expresses that reads can be reordered and that memory operations are independent of input and output. Moreover, if this is extended with a bunch of separate Stores that can be split and merged, this provides a perfect model for aliasing - a Store can be split into two nonaliasing pieces, and operations on one won't affect the other. However, this has the same issue as the state machine representation of control flow - there is nothing preventing all of the memory from being duplicated.

One idea of had to get around this is to make every write to memory produce a read-only Store and to take a "memory continuation" to the next write. This circumvents the issue of duplicated memory, but comes with its own problems. Merging separate stores becomes inexpressible, and I would need a new MemoryPhi node to handle conditionals in the memory pipeline. Given these problems, I am currently leaning towards a state-based approach, but haven't fully decided yet

Backpropagation

In a simple list of instructions, it is easy to do the sort of dataflow analysis I described in the previous post. However, with my new IR, it becomes harder, involving a lot of going "backwards". This is because just following the control flow chain will not result in every node being analysed, as many operations are out of the control flow. To remedy this, the optimizer has to be able to take a node whose inputs haven't been analysed, analyse the inputs, and then do another analysis of the use. This is tedious, but it also makes things a bit more consistent - with the switch to continuation-based control flow, analysing later instructions involves analysing the arguments of earlier instructions.

The other place where this sort off backward analysis becomes crucial is in situations like the following:

if x == 5:
  y := x
else:
  y := 5

This should be optimized to y := 5. However, to do this, the optimizer has to know that x is 5 when analysing y := x. This is more difficult than it looks, because it has to extract this information from the result of a comparison. It has to go backward in the tree, realizing that since x == 5 is true, x must be 5.

In general, it seems that the question that the optimizer is in general asking is: "Here is a node. Here are some facts I know to be true. What is the most simplified form of every node that this node depends upon?"

Allocation and random numbers

Ideally, the optimizer should be able to remove unused allocations and reorder allocation. However, I don't see a way to have them not in the control flow, as they do have side effects. Two runs of an allocation function should not return the same result, and so allocation cannot be a pure function. The same issue shows up with random numbers - unless the programmer cares about replayability, the compiler should be able to reorder and remove calls to a random number generator. This problem in both these cases is the same: we have a function that is not idempotent (calling it twice returns two different things) yet is not side-effecting. I honestly have no idea how to deal with this.

No comments:

Post a Comment