Stack vs Register bytecodes for Python

There seems to be a consensus that register bytecodes are superior to stack bytecodes.  I don’t quite know how to cite “common knowledge”, but doing a google search for “Python register VM” or “stack vs register vm” supports the fact that many people believe this.  There was a comment on this blog to this effect as well.

Anyway, regardless of whether it truly is something that everyone believes or not, I thought I’d add my two cents.  Pyston uses a register bytecode for Python, and I wouldn’t say it’s as great as people claim.

Lifetime management for refcounting

Why?  One of the commonly-cited reasons that register bytecodes are better is that they don’t need explicit push/pop instructions.  I’m not quite sure I agree that you don’t need push instructions — you still need an equivalent “load immediate into register”.  But the more interesting one (at least for this blog post) is pop.

The problem is that in a reference-counted VM, we need to explicitly kill registers.  While the Python community has made great strides to support deferred destruction, there is still code (especially legacy code) that relies on immediate destruction.  In Pyston, we’ve found that it’s not good enough to just decref a register the next time it is set: we need to decref a register the last time it is used.  This means that we had to add explicit “kill flags” to our instructions that say which registers should be killed as a result of the instruction.  In certain cases we need to add explicit “kill instructions” whose only purpose is to kill a register.

In the end it’s certainly manageable.  But because we use a register bytecode, we need to add explicit lifetime management, whereas in a stack bytecode you get that for free.

 

I don’t think it’s a huge deal either way, because I don’t think interpretation overhead is the main factor in Python performance, and a JIT can smooth over the differences anyway.  But the lifetime-management aspect was a surprise to me and I thought I’d mention it.

2 responses to “Stack vs Register bytecodes for Python”

  1. The consensus is one which is well supported in the literature with the most extensive research having been conducted within the context of the Java VM. The best reference I know of is Yunhe et al.

    Click to access p153-yunhe.pdf

    One caveat here is that to truly realize the benefit an interpreter needs to be optimizing and transforming the AST (something I alluded to in my previous comment). Now, it is certainly true that Python bytecode is a very different beast to Java bytecode. However, a few years ago Victor Stinner did investigate the use of a register based VM for CPython

    http://faster-cpython.readthedocs.io/registervm.html

    although the project appears to be dormant the initial results were very promising. One point made by both pieces of work surround the use of computed goto’s within the context of a threaded interpreter to reduce the number of branch mispredictions. However, with modern CPUs — which have dramatically improved branch predictors compared with those of prior generators — this statement is no longer as true as it once was, see Rohou et al.

    https://hal.inria.fr/hal-00911146/document

    Like

Leave a comment