Recently I've been looking at Michael Snoyman's conduit package. Now, conduit is a pretty impressive library, but I was skeptical about its performance. If you look at its source, you see almost every function recursively going down the stack of conduit operations and creating a new one. Now only that, but deforestation is scarce, so it is hard to justify constantly rebuilding these trees.

To fix this, the first thing I tried was using the free monad implementation described here. Unfortunately, this has a large problem: anything besides monad operations is not easy. Functions like `pipe`

, which composes pipes, require building a structure and then traversing it.

After trying a number of similar methods (combining church encoding with Scott encoding, etc.), all of which were equally unsuccessful, I came upon the solution: Stream Fusion.

## Stream Fusion

Stream Fusion (this paper is very readable, by the way) is a techneque for removing unnecessary list or array traversals by using an inversion of control. The key definitions are as follows:

```
data Step s a = Skip s | Yield s a | Done
data Stream a = forall s . Stream (s -> Step s a) s
stream :: [a] -> Stream a
stream {-list-} = Stream next {-list-} where
next [] = Done
next (a : as) = Yield as a
unstream :: Stream a -> [a]
unstream (Stream next s0) = go s0 where
go s = case next s of
Done -> []
Skip s -> go s
Yield s a -> a : go s
```

The library on hackage modifies these functions lightly to force constructor specialization to happen, but this is irrelevant.
Next we have a library full of functions on stream, like so:

```
mapStream :: (a -> b) -> Stream a -> Stream b
mapStream f (Stream next s) = Stream next' s where
next' s = case next s of
Done -> Done
Yield s' a -> Yield s' (f a)
Skip s' -> Skip s'
filterStream :: (a -> Bool) -> Stream a -> Stream a
filterStream p (Stream next s) = Stream next' s where
next' s = case next s of
Done -> Done
Yield s' a
| p a -> Yield s' a
| otherwise -> Skip s'
Skip s' -> Skip s'
```

The `filterStream`

function here is especially interesting, as it shows how the `Skip`

constructor can be useful - it allows `filterStream`

to be non-recursive. This helps GHC's optimizer immensely.
Next, we implement a list library on top of this stream library:

```
map :: (a -> b) -> [a] -> [b]
map f = unstream . mapStream f . stream
filter :: (a -> Bool) -> [a] -> [a]
filter p = unstream . filterStream p . stream
```

Note that if we inline these definitions, we get things like this:
map f . filter g . map h =
unstream . mapStream f .
stream . unstream .
filterStream g .
stream . unstream .
mapStream h . stream

Notice how `stream . unstream`

shows up a lot? Well, at least approximately, streams and lists are isomorphic, so we can add this rule to our library:
{-# RULES
"stream/unstream" forall s . stream (unstream s) = s
#-}

With this simplification, we now have only one `stream`

and one `unstream`

in out pipeline. This means that we now have one tight loop, instead of three, increasing performance.
## Applying This to Conduits

So far, our discussion has only applied to lists. How do we do something like this for, say, a conduit? The answer lies near the end of the paper, in section 9.3, wchich brings up the example of a binary tree:

```
data Tree a b = Leaf a | Branch b (Tree a) (Tree a)
```

To apply stream fusion to this type, we modify the `Step`

type:
data Step a b s = Leaf_s a | Branch_s b s s | Skip s
data Stream a b = forall s . Stream (s -> Step a b s) s

Moving on to conduits, we have what follows.
data Step l i o u m r s =
HaveOutput s (m ()) o
| NeedInput (i -> s) (u -> s)
| Done r
| PipeM (m s)
| Leftover s l
| Skip s
data Stream l i o u m r = forall s . Stream (s -> Step l i o u m r s) s

The full code is available here.
## Performance

As an optimization library, benchmarks are pretty improtant. I've only done preliminary testing, but here are the results. Especially for something so quickly thrown together by someone who doesn't understand the nuances of SpecConstr, I'd say the results are very promising.

## No comments:

## Post a Comment