I try to not read HN/Reddit too much about Pyston, since while there are certainly some smart and reasonable people on there, there also seem to be quite a few people with axes to grind (*cough cough* Python 3). But there are some recurring themes I noticed in the comments about our announcement about Pyston’s future so I wanted to try to talk about some of them. I’m not really aiming to change anyone’s mind, but since I haven’t really talked through our motivations and decisions for the project, I wanted to make sure to put them out there.
Why we built a JIT
Let’s go back to 2013 when we decided to do the project: CPU usage at Dropbox was an increasingly large concern. Despite the common wisdom that “Python is IO-bound”, requests to the Dropbox website were spending around 90% of their time on the webserver CPU, and we were buying racks of webservers at a worrying pace.
At a technical level, the situation was tricky, because the CPU time was spread around in many areas, with the hottest areas accounting for a small (single-digit?) percentage of the entire request. This meant that potential solutions would have to apply to large portions of the codebase, as opposed to something like trying to Cython-ize a small number of functions. And unfortunately, PyPy was not, and still is not, close to the level of compatibility to run a multi-million-LOC codebase like Dropbox’s, especially with our heavy use of extension modules.
So, we thought (and I still believe) that Dropbox’s use-case falls into a pretty wide gap in the Python-performance ecosystem, of people who want better performance but who are unable or unwilling to sacrifice the ecosystem that led them to choose Python in the first place. Our overall strategy has been to target the gap in the market, rather than trying to compete head-to-head with existing solutions.
And yes, I was excited to have an opportunity to tackle this sort of problem. I think I did as good a job as I could to discount that, but it’s impossible to know what effect it actually had.
Why we started from scratch
Another common complaint is that we should have at least started with PyPy or CPython’s codebase.
For PyPy, it would have been tricky, since Dropbox’s needs are both philosophically and technically opposed to PyPy’s goals. We needed a high level of compatibility and reasonable performance gains on complex, real-world workloads. I think this is a case that PyPy has not been able to crack, and in my opinion is why they are not enjoying higher levels of success. If this was just a matter of investing a bit more into their platform, then yes it would have been great to just “help make PyPy work a bit better”. Unfortunately, I think their issues (lack of C extension support, performance reliability, memory usage) are baked into their architecture. My understanding is that a “PyPy that is modified to work for Dropbox” would not look much like PyPy in the end.
For CPython, this was more of a pragmatic decision. Our goal was always to leverage CPython as much as we could, and now in 2017 I would recklessly estimate that Pyston’s codebase is 90% CPython code. So at this point, we are clearly a CPython-based implementation.
My opinion is that it would have been very tough to start out this way. The CPython codebase is not particularly amenable to experimentation in these fundamental areas. And for the early stages of the project, our priority was to validate our strategies. I think this was a good choice because our initial strategy (using LLVM to make Python fast) did not work, and we ended up switching gears to something much more successful.
But yes, along the way we did reimplement some things. I think we did a good job of understanding that those things were not our value-add and to treat them appropriately. I still wonder if there were ways we could have avoided more of the duplicated effort, but it’s not obvious to me how we could have done so.
Issues people don’t think about
It’s an interesting phenomenon that people feel very comfortable having strong opinions about language performance without having much experience in the area. I can’t judge, because I was in this boat — I thought that if web browsers made JS fast, then we could do the same thing and make Python fast. So instead of trying to squelch the “hey they made Lua fast, that means Lua is better!” opinions, I’ll try to just talk about what makes Python hard to run quickly (especially as compared to less-dynamic languages like JS or Lua).
The thing I wish people understood about Python performance is that the difficulties come from Python’s extremely rich object model, not from anything about its dynamic scopes or dynamic types. The problem is that every operation in Python will typically have multiple points at which the user can override the behavior, and these features are used, often very extensively. Some examples are inspecting the locals of a frame after the frame has exited, mutating functions in-place, or even something as banal as overriding isinstance. These are all things that we had to support, and are used enough that we have to support efficiently, and don’t have analogs in less-dynamic languages like JS or Lua.
On the flip side, the issues with Python compatibility are also quite different than most people understand. Even the smartest technical approaches will have compatibility issues with codebases the size of Dropbox. We found, for example, that there are simply too many things that will break when switching from refcounting to a tracing garbage collector, or even switching the dictionary ordering. We ended up having to re-do our implementations of both of these to match CPython’s behavior exactly.
Memory usage is also a very large problem for Python programs, especially in the web-app domain. This is, unintuitively, driven in part by the GIL: while a multi-process approach will be conceptually similar to a multi-threaded approach, the multi-process approach uses much more memory. This is because Python cannot easily share its memory between different processes, both for logistical reasons, but also for some deeper reasons stemming from reference counting. Regardless of the exact reasons, there are many parts of Dropbox that are actually memory-capacity-bound, where the key metric is “requests per second per GB of memory”. We thought a 50% speed increase would justify a 2x memory increase, but this is worse in a memory-bound service. Memory usage is not something that gets talked about that often in the Python space (except for MicroPython), and would be another reason that PyPy would struggle to be competitive for Dropbox’s use-case.
So again, this post is me trying to explain some of the decisions we made along the way, and hopefully stay away from being too defensive about it. We certainly had our share of bad bets and schedule overruns, and if I were to do this all over again my plan would be much better the second time around. But I do think that most of our decisions were defensible, which is why I wanted to take the time to talk about them.