This blog’s been pretty neglected lately, so I thought I’d post about progress on Pyston, since that’s where most of my time has been going.
The main two goals for Pyston are performance and language compatibility; for the 0.1 launch (the original blog post) I focused on the main performance techniques, and recently I’ve switched gears back to working on compatibility. There are two schools of thought on the relative importance of the two areas: one says that performance doesn’t matter if you can’t apply it to your programs (ie compatibility > performance), and another says that for an alternative implementation, if you have poor performance it doesn’t matter what your compatibility story is (ie performance > compatibility). Of course the goal is to have both, so I think it’s mostly a useless argument, since at some point someone will come up with an implementation that is both performant and compatible, and prove you need both.
Regardless, I’ve been working on compatibility lately, because Pyston yet run an interesting set of benchmarks to do much further optimization work. A couple big things have landed in the past week or so.
Exceptions
This took me a surprising amount of time, but I’m happy to say that basic support is now in Pyston. Like many things in Python, exception handling can be quite sophisticated; right now Pyston only has the basics, and we’ll have to add more features as they’re required. Right now you can raise and catch exceptions, but finally blocks (or with blocks) aren’t supported yet. sys.exc_info also isn’t supported yet — turns out that the rules around that are pretty gnarly.
I wrote some previous blog posts about the exception handling work. In the end, I decided to go (for now) with C++ exceptions through-and-through. This means that we can use native C++ constructs like “throw” and “catch”, which is the easiest way to interface between runtime exceptions and a C++ stdlib. I ended up using libgcc instead of libunwind, since 1) LLVM’s JIT already has libgcc __register_frame support (though I did manage to get the analogous _U_dyn_register support working for libunwind), and 2) libgcc comes with an exception manager, whereas libunwind would just be the base on top of which we would have to write our own. It actually more-or-less worked all along — even while writing those previous blog posts about how to implement exceptions, it would have worked to simply “throw” and “catch” exceptions from the C++ side.
This approach seems functional for now, but I think in the long term we’ll have to move away from it:
- Using _U_dyn_register, and I assume __register_frame as well, results in a linked list of exception handling information. This means that raising exceptions ends up having linear overhead in the amount of JIT’d functions… not good. This kind of problem is avoided for static code by using a binary tree, for logarithmic overhead; I think a similar approach can be used for dynamic info, but may require changes to the unwind API. This may be possible with libgcc, but the GPL licensing makes me want to avoid that kind of thinking.
- C++ exception handling and Python exception handling look superficially similar, but the exact semantics of how an exception propagates ends up being quite different. Different enough, unfortunately, that we can’t use C++’s propagation logic, and instead have to layer our own on top. To do this we construct every Python “except” block as a C++ catch block that catches everything; inside that catch block we do the Python-level evaluation. I haven’t profiled it, but I would imagine that this leads to a fair amount of overhead, since we’re constantly creating C++ exception objects (ie what the value you “throw” gets wrapped with) and then deallocating them, and doing a whole bunch of bookkeeping in the C++ handler that ends up being completely redundant. So long-term I think we’ll want to code up our own exception manager.
Inheritance
Exceptions were hard because implementing them efficiently amounts to black magic (stack introspection). Inheritance is hard because Python’s inheritance model is very sophisticated and pervasive.
On the surface it seems straightforward, especially if you ignore multiple inheritance for now: when looking up an attribute on a type, extend the search through the base classes of that type. Setting of base-class attributes will typically be handled by calling the base class’s __init__, and doesn’t need special treatment from the runtime.
The thing that makes this tricky is that in Python, you can inherit from (almost) any built-in class, such as things like int. If you think about it, an int has no member that corresponds to the “value” of that int — how would that even be exposed to Python code? Instead, an int has a C-level attribute, and the int class methods know how to access that. This means, among other things, that ints have a different C-level shape than anything else; in general, every Python class is free to have instances with different memory layouts.
That might not be too bad if you could simply copy the memory layout of the parent when creating a subclass. But if you subclass from int, you gain the ability to put custom attributes on your subclass instances. In other words, normal ints in Python don’t have a __dict__, but if you subclass int, you will (typically) end up with instances that do have a __dict__ in addition to the “int” C shape. One way to implement this would be to put a __dict__ on every object, but this would bloat the size of ints and other primitives that don’t need the __dict__ object. Instead, we have to dynamically add to the memory layout of the base class. It’s not all that bad once the systems are in place, but it means that we get to now have fun things like “placement new” in the Pyston codebase.
So we now have basic inheritance functionality: you can subclass some types that have been updated (object, int, and Exception so far), and the framework is there to fix the rest. There’s definitely a lot of stuff that’s missing — inheritance is a deep part of Python’s object model and pretty much all code needs to be made subclass-aware.
Tracebacks
Tracebacks ended up not being too difficult to implement, except for the fact that yet again, Python offers a tremendous amount of functionality and sophistication. So like the other new features, there’s only basic traceback support: if you throw an uncaught exception, the top-level handler will print out a stack trace. This functionality isn’t exposed at the Python level, since there’s quite a bit more to the API than simply collecting tracebacks, but for now we have the low-level code to actually generate the traceback information. And getting stack traces from the top-level handler is pretty nice for debugging, as well.
So all in all, Pyston now supports a number of new language features, though at a pretty rudimentary level. The work I just described was implementing the core of these features, but they all have large surface areas and will take some time to implement fully.