Friday, January 25, 2013

Using Compiler Bugs for Fun and Profit: Introducing Cartesian Closed Constraints

Anyway, implicit parameters are not idiomatic Haskell these days, and unlikely to make it into a standard, and once you start to involve unsafeCoerce then you're clearly off the reservation.
You can't have an implicit parameter in the context of a class or instance declaration. For example, both these declarations are illegal: class (?x::Int) => C a where ... instance (?x::a) => Foo [a] where ... Reason: exactly which implicit parameter you pick up depends on exactly where you invoke a function. But the ``invocation'' of instance declarations is done behind the scenes by the compiler, so it's hard to figure out exactly where it is done. Easiest thing is to outlaw the offending types.
--The Glorious Glasgow Haskell Compilation System User's Guide, Version 7.4.2

Modern Haskell allows you to pass typeclass instances between functions by wrapping them with a GADT. But this is very limited. Wouldn't it be nice if there were a simple way to produce a typeclass instance from a record? This would only be useful if we could have multiple instances of the same typeclass/type combination, and everyone knows that is not permitted. Right? Almost a year ago I started this blog with a post on how to generate "First Class Instances" but the techniques were hackish and used unsafeCoerce in an unsafe way (okay, only as unsafe as GeneralizedNewtypeDeriving).

What would you say if I showed you a way to build arbitrary instances of typeclasses from dictionaries locally? A way that didn't use unsafeCoerce? A way that except for violating the "single instance" assumption, appears to be totally safe?

First thing is first. Haskell posts need to begin with the requisite language pragmas.

yes, you got that right, today we will be using SafeHaskell.

For an example, we will define differing ordering relations on a type. The techniques presented in this post will not allow one to reliably "shadow" an instance declaration in scope, so we define a newtype to work with

Since we plan to be working with ordering relations, lets define a record variant of the Ord type class. Actually, a single newtype will do.

Now here is the trick. The GHC docs say we can't use ImplicitParams in an instance declaration. Turns out the docs are wrong.

Now we can test it. Sometimes I want to compare strings by looking at length. Other times I want to alphabetize them.

I have tried this code in several versions of GHC. Just now I tried it on GHC 7.4.2 and 7.6.1. In each case I ran it both in GHCi and compiled at varios optimization levels. It always compilers with out error and produces the expected result, namely "True False."

This isn't scary

I noticed this behavior a long time ago, and other than showing it to some people at the Oregon Programming Language Summer School, I have kept pretty quite about it. The truth is I'm terrified that people will want to fix it. Don't. The compiler bug here is doing the right thing.

The common assumption of the "single global instance" simply does not hold anyways. At least, it doesn't hold in a way that can be used to ensure the kind of invariants that things like Data.Set and Data.Map assume. And, this should be obvious. Imagine a module (call it M1) that exports a type. Then two other modules (M2 and M3) each import that type, and declare the same (orphan) instance for that type. A forth module (M4) imports both instances. Obviously, M4 can't use either instance directly. But, this restriction does not prevent violating module invariants. If M2 exposes a Set or our type, and M3 exposes a function to modify Sets of our type, composing these two functions will have unpredictable module boundary breaking behavior: even though all instances obeyed the typeclass laws.

Try it. It is not that hard to break Data.Set and similar modules. Of course, using things like GADTs we can push the invariant breaking much further. The simple fact is that modules that rely on the single instance property for safety are fundamentally broken.

A Categorical Use Case

I have made several bold claims in this post. Here is another one. The ImplicitParams trick is useful and should be made official behavior.

Edward Kmett has written some pretty cool category theory packages, and has a very nice package for using GHC's support for ConstraintKinds. These packages don't quite work for what I want to show here, but they come close so I will use similar ideas.

After enabling some extensions (we are no longer in SafeHaskell land as I wrote this up quickly) and hiding a bunch of the Prelude, we can define some useful typeclasses for working with categories. CCCs show up a lot, as they provide a categorical version of the simply typed lambda calculus. Thus, a good example of a CCC is the category Hask. We could even write a program that took Haskell like lambda calculus syntax (perhaps generated with a TemplateHaskell or a finally tagless interface) and produced a Haskell term polymorphic in the CCC.

Put aside the category stuff, and think about the algebra of Constraints in Haskell. With current versions of GHC we can develop a rich set of tools for talking about constraints that last type class encodes as a constraint the judgment that one constraint implies another. That is, it internalizes the entailment relation.

We would actually like some instances of this relation, and that is where the trick in this post becomes useful. some more stuff that is not very interesting and we can begin to define some other standard connectives. For example, Haskell allows you to combine Constraints using tuple notation, but this isn't really considered a constructor in GHC. So, we define our own "product" constraint

What is the point of all this? Well, Edward Kmett noted that the judgment form of type class entailment (that is, the type version) can form a category using the pairs above though, we can make this into a Symmetric Monoidal Category and for that matter a CCC

Is this useful? Possibly. But it is super cool regardless. I strongly think that this should be made the official behavior and kept in the language. Having tons of categorical structure show up around one of your languages core features (in this case, typeclasses) is usually going to be a good thing. Let's let people see if we can make this useful.

The GHC docs give the justification that the behavior of what instance would be picked would be unclear. I don't think this is true. We already have both local evidence and implicit parameters which present exactly the same problems. It is time to document this behavior, and start using it for good.


  1. The category of Constraint is thin, is it not? i.e. at most one morphism between objects.

    I was wondering whether and how the Constraint kind was inhabited. The answer is yes, in the standard category theory way: the terminal object of Constraint is (), and morphisms from () to any constraint give "values" of that constraint. These, of course, are simply instances of the constraint.

    1. No. If you allow constructions like this it is not thin.

  2. What do we do, under this proposal, to fix Data.Set? Or rather, what do we do to fix Data.Set without enforcing the single-instance assumption. The especially challenging case is Data.Set.union and friends (things like member and insert are easy).

    A while ago I had the crazy idea that the Set type constructor should take a parameter indicating which instance, but that requires that we be able to statically enumerate instances.

    Another option may be some terrifying heroics ala SML's exn (which, naturally, would limit us to creating instances inside IO or ST) and the use of a "data Equal a b where Refl :: Equal a a" to do dynamic testing of instance equality.

    1. The right way to do it is with modules: the module includes has an abstract type of Sets, and operations to perform unions, insertions, etc. Haskell doesn't have real modules, but you can lock up a type variable to enforce safety with a higher rank type or an existential. Something like Data.Reflection could probably do it.

      Other than that your only choice is dynamic checks plus side side effects and or black magic. This is unacceptable to me.

    2. If you existentialize the ordering used by Data.Set, you preclude any efficient merge or difference operations, no? Or at least, mandate that they use dynamic checks and fall back to inefficient operations on mismatch.

      It would seem to me that the Ord instance must be an overt parameter in the type constructor?

    3. You can do the same thing that the ST monad does to encode the Ording as a faked dependent type, or equivalently, using an existential/module. This is limited for obvious reasons, but handles many use cases like you want to store a handful of different sets of the same type in a data structure each with the same ordering relation and occasionally perform merges and unions on them.

      Other than that you can have runtime checks/inefficient fallbacks.

      Note that the efficiency issue is probably overrated. We already have unsafe functions in the standard libraries like mapMonotonic, and other languages get by without efficient merge operations (see the Scala collections libraries). Most use cases I think a proper module based design could handle. I will probably make a post about this soon. The key thing you need to make this work well is manifest types, but I know how to fake manifest types in Haskell.

  3. I think the fact that you can define two instances for the same class+type in M2+M3 and then import them in M4 is a bug in GHC.

    In particular, The Haskell 2010 report says (section 4.3.2):

    "A type may not be declared as an instance of a particular class more than once in the program."