Progress on implementing exceptions for Pyston

I have to say, implementing exceptions for Pyston has been considerably harder than any of the other features I’ve implemented, including what seem like hard techniques such as inline caches and on-stack-replacement. I’d say it’s hard because 1) coming up with an IR representation of exceptions is difficult, and 2) actually implementing the exception-handling is difficult and full of black magic.

Problem #1 is surprisingly difficult and something I think of when estimating how long exceptions would take to implement. The question is how to represent exception-handling in a compiler mid-end; in the front-end and back-end, the representation structure is more-or-less fixed as “apply this exception handling behavior to this range of code”. In the front-end, this takes the form of try-except blocks (“apply this except block to this range of statements”), and in the back-end it takes the form of DWARF unwinding information (“apply this handler to this range of instructions”). In the middle-end, this kind of representation can be bad, since it can inhibit data-flow optimizations since control flow is no longer directly represented. The two approaches I’ve seen to this are to either use the range approach (ex how CPython or JavaScriptCore handle it), or to create an “every statement inside a ‘try’ block can branch to the except block” representation. The latter can be more expensive to create and analyze, but can more-precisely represent the behavior of the program.

I debated between these for a while, and in the end I decided to go with the latter approach, of creating a verbose IR where every statement can potentially branch to an except handler. It feels bloated to me, and I don’t like the idea that code is more inefficient if directly within a try block than if called from a try block, but it seems like the only option.

Once the IR is created, there are a couple options for how to implement it. One option is “setjmp-longjmp” exception handling, where we keep a stack of exception handlers in the form of setjmp targets, and when an exception is thrown we longjmp to it. This feels somewhat straightforward, though it incurs a fair amount of cost for every try-block entered. Another option is to use checked-status-code exception handling, where we keep a global “exception thrown” status flag that gets checked after every function call. We can also use the return value of the function to signify an exception, often by returning a NULL value, though that feels tough to extend to non-pointer return values.

The last option is to use some form of unwind-based exception handling, where we pay “zero cost” on entering a try block, but when an exception is thrown we have to reconstruct the call chain and locate exception handling parameters. This makes the non-throwing case much faster — supposedly “zero cost”, though in practice I’m finding I can’t reach that for various practical reasons, mostly around how a try-block disables some important optimizations such as inline caches. I think this pessimization is implementation-specific, and with some changes to LLVM we could allow invoking a patchpoint (what would be required to support inline caches in a try block), but for now they have to be disabled and things end up being slower. This probably makes try blocks a fair bit slower than if we used either of the other mechanisms, though it has an advantage over status-checked exceptions that it only affects code in try blocks. So in addition to presuming that exceptions will be rare, the efficiency of this method relies on the presumption that try blocks are rare as well.

Implementing stack unwinding requires the cooperation of a number of different pieces of code. First, the compiler has to emit DWARF information that describes the layout and stack-manipulation functionality of a function. Then, an unwinding library needs to locate and parse the DWARF information to produce concrete maps from instructions to language data structures. Then, a language-specific runtime function uses the unwinding library to produce language-appropriate unwinding behavior. For example, Pyston’s conservative GC uses libunwind to examine the stack, and simply scans the entire stack for possible pointers.

For exception handling, my goal is to reuse LLVM’s existing DWARF-emission support (not much choice there), and to use libunwind as the unwinding library. I’m still undecided about whether or not to create a custom Python unwinder or to use the existing production-quality C++ unwinders; the C++ unwinders bring the benefits of being production-quality already, but may have some quirks when it comes to language-specific behavior such as throwing exceptions from exception handlers.

Regardless of how the unwind information will ultimately get consumed, we first have to figure out how to get the unwind information to the stack unwinder. LLVM helpfully emits full-blown .eh_frame sections, similar to what you would find in a normal .o file; the problem is that the unwinder won’t know where to find those unless you tell it. It took me a while to figure out how to register .eh_frame sections with libunwind, despite the existence of documentation for that feature; the trick is you need to use a REMOTE_TABLE format instead of just TABLE. I may move away from using libunwind, though, and start using libgcc which has both an unwinder and a C++ exception manager; or I might use the LLVM libc++ instead.

Hopefully I can get this all worked out this week, since I’m getting pretty tired already of digging through libunwind and libgcc.

Progress on implementing exceptions for Pyston

Share this:

Leave a comment Cancel reply