Mon 25 May 2020

Filed under Posts

Tags idris

I've just released a new version of Idris 2, which is a significant release in that it's the first one which is able to compile itself. Thus, it's the first one where we can really see how the performance of Idris 2 improves on the performance of Idris 1.

A question people sometimes ask when discussing Idris on various parts of the Internet is "Why is Idris 2 so much faster than Idris 1?" Sometimes it involves the phrase "the author claims that..." so here are some numbers, as measured today on my Dell XPS 13:

  • Idris 1 building Idris 2 (bootstrap version): 743s
  • Idris 2 building Idris 2 (with the Chez Scheme runtime): 76s
    • Update (2020-05-27): It turns out there was an error in processing imports that made it take too long when importing modules. Combined with some other small improvements, the new time is: 48.5s

This is for type checking and generating/optimising intermediate code only, and doesn't include generating and compiling the C/Chez Scheme output. The bootstrap version is a slightly earlier version, and a little bit smaller, but they are similar enough to be able to make a useful comparison, and certainly give you a feel of how much of an improvement Idris 2 is. That's nearly an order of magnitude!

I wouldn't necessarily say the result is fast in an absolute sense [1] and there are no doubt many improvements still to be found, but it is still interesting to try to understand why the difference. Idris 1 is implemented in Haskell, but that has little (if anything) to do with the difference. Rather, it's a combination of:

  • my improved understanding of what it takes to implement a dependently typed language (or, indeed, any language)
  • insights gained from other people's experiments, most notably Andras Kovacs' work on smalltt
  • Lots of profiling, via gprof while bootstrapping, and the Chez Scheme profiling tools.

A lot of the improvements are down to avoiding make bad decisions in the first place! This is easy to say in hindsight, but Idris 1 was originally a vehicle for experimenting with some ideas in dependently typed language design that were hard to do just by modifying other systems and so I was more interested in having a system sooner, rather than having a better system. Unfortunately, some of those early decisions proved hard to undo as the system grew and as it gained users.

The rest of this post summarises some of the most significant internal differences.

The global context

The global context stores all of the information that might be needed for typechecking: function and data types and definitions, among other things. We need function definitions as well as types, even for imported modules, since we may need to do evaluation at the type level during type checking.

The big difference between Idris 1 and Idris 2 is that Idris 2 represents the context as an array, with constant time access and update. Yes, it's mutable! We recognise that there's going to be a lot of updates: mainly solving unification problems as type checking proceeds, but also during interface resolution. And, we only ever have one context (ignoring ambiguity resolution, but I won't go into that here). So, having a single mutable context with constant time access has been valuable. Yes, it's a bit ugly to a pure functional programmer, but I'm not going to let that get in the way of performance! At least the types are precise about where updates might happen.

This turns out to have another benefit, which is that on loading an imported module, we don't need to add definitions to the context until they are actually referenced. Most names from imports are never referenced - at least not all in the same file - so Idris 2 lazily decodes definitions and mutates the context when they are needed. In fact profiling Idris 1 suggests that it spends a huge amount of time unnecessarily loading definitions.

Unification

Unification is an important part of type checking a dependently typed language, to resolve implicit arguments. For example, if you have a function

append : Vect n a -> Vect m a -> Vect (n + m) a

and an application

append [1,2,3] [4,5,6,7]

...then the typechecker needs to infer that n = 3, m = 4 and a = Integer. Idris 2 implements dynamic pattern unification. In Idris 1, metavariables (that is, the implicits that need to be solved) are stored locally to a term, so solving them requires searching through the term and substituting the definition, which can be expensive. Idris 2, on the other hand, stores metavariables globally, in the mutable global context, following roughly the scheme described in Ulf Norell's thesis.

Furthermore, we don't substitute the solution of a metavariable in a term unless we really really need to (in practice, that's only while checking linear binders are used appropriately, so fairly rare). This preserves sharing as far as possible, which can be important in terms with dependent types. For example (with apologies for using a vector again):

[1,2,3]

elaborates into

((::) {a=Integer} {n=S (S Z))} 1
  ((::) {a = Integer} {n=S Z} 2
    ((::3) {a = Integer} {n=Z} (Nil {a = Integer})))

That Z appears three times, the subexpression S Z appears twice, and this gets worse as the vector grows. So sharing is important! This is the main insight gained from Andras Kovacs' work on smalltt.

Compile-time evaluation

Type checking a dependently typed language involves evaluating expressions, possibly in the presence of unknown variables, to check whether two terms are equal or to infer values for metavariables. For example, we might need to solve for ?meta in the following:

plus (S (S x)) (S (S y)) = S ?meta

Idris 1 does this by evaluating the terms, leading to the equation

S (S (plus x (S (S y)))

...then deciding ?meta = S (plus (x (S (S y)))). This is fine for small terms, but what if the expression is

plus 100 100 = S ?meta

...or something even larger? We don't need to evaluate fully to normal form to establish that ?meta = plus 99 100. Idris 2 therefore reduces to head normal form during type checking, which it probably should have done all along! Cons should not evaluate its arguments.

Normally, you won't notice this. But, sometimes, in Idris 1, compile time evaluation really hurts performance, and a dependently typed language needs to recognise that compile time evaluation is going to happen a lot!

Chez Scheme runtime

Idris 2 benefits from a robust, well-engineered and optimised run time system, by compiling to Chez Scheme. One might expect that Chez Scheme would be slower than the C code generated by Idris 1, since it is dynamically typed, but this ignores two important points:

  • It is possible to turn off the dynamic type checks in Chez Scheme. So, the run time never checks if a constructor argument is within bounds of the vector where we store it, and never checks whether a primitive is the right type.
  • Even more importantly, there has never been significant effort put into the Idris 1 run time. Generating good C code corresponding to a functional program which might have lots of small (and higher order) definitions is extremely challenging, so it is better to take advantage of someone else's work here.

Abandoning tactic-based elaboration

Idris 1 used a tactic-based elaborator, much as you'd see in Coq proofs. This made implementing new high level features relatively easy, using an elaboration monad which was, in a sense, an embedded domain specific language for writing new tactics. This is usable directly by elaborator reflection meaning that users could try extending Idris in Idris itself. While this is a fun way to work on the language, it took a lot of effort to get it to perform even as well as it did (which, as you see above, is not that well). Idris 2 takes a much simpler approach, elaborating syntax directly into the core representation.

Non-reasons

One performance problem in Idris 1 was due to space leaks in the Haskell caused by using lazy data structures where they weren't needed. Despite our best efforts, we were never able to resolve these fully. No doubt it would have been better to consider where strict data structures would have been better. But, we didn't. This isn't to say that Idris has proved a better implementation choice because it's strict, just to say that we needed to put more thought, earlier, into where we needed strict data in Haskell.

[1]It's about 45,000 lines of Idris 2 code. No doubt a Go programmer, or even a Pascal programmer in the 1970s, would scoff at the idea that a mere 45,000 lines would take 76 seconds to type check. But, those 45,000 lines are pulling a lot of weight, and the type checker is doing a lot of work! For example, the fact that it typechecks means that all of the functions cover all possible inputs, and all expressions throughout all phases are guaranteed to be scope correct, as is unification. This is a major source of bugs in a type checker and compiler.

Thu 27 February 2020

Filed under Posts

Tags idris paper

A paper on Idris 2 and QTT

Read More

Fri 10 January 2020

Filed under Posts

Tags idris

A discussion on erasure and linearity in Idris 2

Read More

© Edwin Brady Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More