kmod's blog

1Feb/1716

Personal thoughts about Pyston’s outcome

I try to not read HN/Reddit too much about Pyston, since while there are certainly some smart and reasonable people on there, there also seem to be quite a few people with axes to grind (*cough cough* Python 3).  But there are some recurring themes I noticed in the comments about our announcement about Pyston's future so I wanted to try to talk about some of them.  I'm not really aiming to change anyone's mind, but since I haven't really talked through our motivations and decisions for the project, I wanted to make sure to put them out there.

Why we built a JIT

Let's go back to 2013 when we decided to do the project: CPU usage at Dropbox was an increasingly large concern.  Despite the common wisdom that "Python is IO-bound", requests to the Dropbox website were spending around 90% of their time on the webserver CPU, and we were buying racks of webservers at a worrying pace.

At a technical level, the situation was tricky, because the CPU time was spread around in many areas, with the hottest areas accounting for a small (single-digit?) percentage of the entire request.  This meant that potential solutions would have to apply to large portions of the codebase, as opposed to something like trying to Cython-ize a small number of functions.  And unfortunately, PyPy was not, and still is not, close to the level of compatibility to run a multi-million-LOC codebase like Dropbox's, especially with our heavy use of extension modules.

So, we thought (and I still believe) that Dropbox's use-case falls into a pretty wide gap in the Python-performance ecosystem, of people who want better performance but who are unable or unwilling to sacrifice the ecosystem that led them to choose Python in the first place.  Our overall strategy has been to target the gap in the market, rather than trying to compete head-to-head with existing solutions.

And yes, I was excited to have an opportunity to tackle this sort of problem.  I think I did as good a job as I could to discount that, but it's impossible to know what effect it actually had.

Why we started from scratch

Another common complaint is that we should have at least started with PyPy or CPython's codebase.

For PyPy, it would have been tricky, since Dropbox's needs are both philosophically and technically opposed to PyPy's goals.  We needed a high level of compatibility and reasonable performance gains on complex, real-world workloads.  I think this is a case that PyPy has not been able to crack, and in my opinion is why they are not enjoying higher levels of success.  If this was just a matter of investing a bit more into their platform, then yes it would have been great to just "help make PyPy work a bit better".  Unfortunately, I think their issues (lack of C extension support, performance reliability, memory usage) are baked into their architecture.  My understanding is that a "PyPy that is modified to work for Dropbox" would not look much like PyPy in the end.

For CPython, this was more of a pragmatic decision.  Our goal was always to leverage CPython as much as we could, and now in 2017 I would recklessly estimate that Pyston's codebase is 90% CPython code.  So at this point, we are clearly a CPython-based implementation.

My opinion is that it would have been very tough to start out this way.  The CPython codebase is not particularly amenable to experimentation in these fundamental areas.  And for the early stages of the project, our priority was to validate our strategies.  I think this was a good choice because our initial strategy (using LLVM to make Python fast) did not work, and we ended up switching gears to something much more successful.

But yes, along the way we did reimplement some things.  I think we did a good job of understanding that those things were not our value-add and to treat them appropriately.  I still wonder if there were ways we could have avoided more of the duplicated effort, but it's not obvious to me how we could have done so.

Issues people don't think about

It's an interesting phenomenon that people feel very comfortable having strong opinions about language performance without having much experience in the area.  I can't judge, because I was in this boat -- I thought that if web browsers made JS fast, then we could do the same thing and make Python fast.  So instead of trying to squelch the "hey they made Lua fast, that means Lua is better!" opinions, I'll try to just talk about what makes Python hard to run quickly (especially as compared to less-dynamic languages like JS or Lua).

The thing I wish people understood about Python performance is that the difficulties come from Python's extremely rich object model, not from anything about its dynamic scopes or dynamic types.  The problem is that every operation in Python will typically have multiple points at which the user can override the behavior, and these features are used, often very extensively.  Some examples are inspecting the locals of a frame after the frame has exited, mutating functions in-place, or even something as banal as overriding isinstance.  These are all things that we had to support, and are used enough that we have to support efficiently, and don't have analogs in less-dynamic languages like JS or Lua.

On the flip side, the issues with Python compatibility are also quite different than most people understand.  Even the smartest technical approaches will have compatibility issues with codebases the size of Dropbox.  We found, for example, that there are simply too many things that will break when switching from refcounting to a tracing garbage collector, or even switching the dictionary ordering.  We ended up having to re-do our implementations of both of these to match CPython's behavior exactly.

Memory usage is also a very large problem for Python programs, especially in the web-app domain.  This is, unintuitively, driven in part by the GIL: while a multi-process approach will be conceptually similar to a multi-threaded approach, the multi-process approach uses much more memory.  This is because Python cannot easily share its memory between different processes, both for logistical reasons, but also for some deeper reasons stemming from reference counting.  Regardless of the exact reasons, there are many parts of Dropbox that are actually memory-capacity-bound, where the key metric is "requests per second per GB of memory".  We thought a 50% speed increase would justify a 2x memory increase, but this is worse in a memory-bound service.  Memory usage is not something that gets talked about that often in the Python space (except for MicroPython), and would be another reason that PyPy would struggle to be competitive for Dropbox's use-case.

 

So again, this post is me trying to explain some of the decisions we made along the way, and hopefully stay away from being too defensive about it.  We certainly had our share of bad bets and schedule overruns, and if I were to do this all over again my plan would be much better the second time around.  But I do think that most of our decisions were defensible, which is why I wanted to take the time to talk about them.

Filed under: Pyston Leave a comment
Comments (16) Trackbacks (2)
  1. I’m glad you worked on this. Thanks for posting the update.

  2. Right now you are using other languages for those hot paths right ? So the Cython argument is moot, you did recode them all after all.

    Also pypy does work hard to support C extensions, particularly through Cffi. And you could have improved this situation a lot, instead of spending resources at reinventing a wheel you just thrown away. Every time I read Pyston authors, I have the strong feeling they didn’t spend a lot of time investigating Pypy. It feels like they read a few things about it and called it a day.

    This whole work scream the typical attitude of geeks that want to play at creating something, and rationalize it after the fact. I understand that. I do it all the time. But let’s not lie.

    Also I’m sad nowhere in the dev cycle you considered helping with pyjion, espacially since you trashed you code base once on the road. Ultimately it is a much saner solution to this problem : you don’t touch CPython, you stay always up to date and compatible with it and you focus your efforts on the JIT part.

    I would have been more incline to be supportive of pyston if it tried something really radical like trying to go the compiled route by working on nuitka or trying to rewrite CPython in rust to see if can get the best of pypy/cpython world. Failure would have looked less bad : “look, we tried something very new it didn’t work”. Instead of : “we tried to recreate something that existed, ignoring other solutions, and we still failed”.

    All in all, I still have to salute the effort. It’s easy to criticize and I wouldn’t have have the talent not the strength to do half of what you did, so kuddos for that. And it’s always nice to have people trying to improve open source no matter what. After all, if you succeeded, you would have been considered heroes.

    I just wish you went a different way.

    • Hi Sam, thank you for your feedback. I tried to discuss our motivations for taking the approach we did, but yes it will always be hard to know how a different approach would have worked out.

      As I mentioned, most of our time has gone into features that PyPy does not have/want, so I’m not sure how much time it would have saved us (if any) to start with their codebase. Pyjion definitely has a strong compatibility story but my worry is that their narrow API is too limiting; Pyston’s speedups wouldn’t fit into that model. I hope they get the opportunity to validate their approach.

      And yes it would have been fun to do something radical :) I think what we built is different enough to not just be “another PyPy but less mature”, and has a much better chance of being successful at Dropbox and elsewhere. At the end of the day, though, combining PyPy’s issues with funding and uptake, along with the Dropbox decision, makes me feel like it’s not really the approach that matters here but the demand (or lack thereof). Finding the right domain to target remains an interesting problem for Python performance!

      • “[M]ost of our time has gone into features that PyPy does not have/want”

        I’d like to better understand the differentiation here. What features were these?

        “My understanding is that a “PyPy that is modified to work for Dropbox” would not look much like PyPy in the end.”

        Similarly, what sort of modifications do you think would have been rejected by the PyPy project? What is the basis of your understanding?

  3. So basically you gave up and went to Go. Very disappointed in you and the Dropbox team.

    • Hi John, I share some of your disappointment in the outcome, but at the end of the day I’m an engineer because I want to make an impact as opposed to just working on interesting problems, and I think the Pyston decision reflects that. I think there are some other areas of Python performance that are very important, perhaps in the numerical computing space, and I’m currently taking a look at that. Who knows how this will all work out!

  4. Thank you for your efforts! Greatly prefer Python lang & still-growing ecosystem, over Go. This work needs to go forward, in whatever fashion.

  5. Isn’t LLVM too slow for highly dynamic languages due to being designed for static languages such as C, C++, Swift, Rust. Where did Unladen Swallow go?

    Have you tried collaborating with Intel or AMD? These guys know a lot about PGO and have contributed to CPython and PyPy.

    I’m surprised that Cython could not solve your problems. There are codebases in Cython close to million lines of code.

  6. PyPy contributor here. Looking forward to seeing your PyPy patches.

  7. Just wondering what you thought of Grumpy, a youtube attempt at translating Python to Go in order to get parallelism. (interesting discussion of it here: https://lwn.net/Articles/710634/ )

    • Well first of all, saying “Go is fast, let’s use it” doesn’t get us around what I think the central problem is: it’s the Python object model that’s complicated and slow. So for each slow feature, the grumpy team will have to either implement it in Go, which will be slower than CPython, or they can eliminate it and rely on the Go runtime, which will be fast but without certain features.

      From their blog post it sounds like they are planning on mostly doing the latter (not supporting features and directly using the Go runtime), but when I poked around the code it looks like they also do some of the former (reimplement CPython in Go). I’m not sure that either end of the spectrum (Go with Python syntax, or CPython rewritten in Go) is that useful, but it’ll definitely be interesting to see if they can find a good tradeoff in the middle!

  8. Ah, I just saw the news through this. I hadn’t known about the project previously, but your explanation makes sense, and is a bummer to me as a maintainer of a >100KLOC Python codebase (also sorta-memory-bound in the way you describe). The underlying challenges suggest I probably shouldn’t expect lightning to strike and suddenly deliver a many-times-faster compatible VM ‘for free’.

    Props for pushing on this problem so hard. 10% in production is some serious impact at that scale. And V8 only got where it did with utterly bonkers levels of investment, and agree re: JS having fewer hooks for dynamic stuff to happen, and Lua far fewer. And some deep things like the GIL and refcounting seem pretty much out of the control of anyone trying to build a drop-in substitute for CPython today. (Those things alone could motivate Dropbox to use different languages on some kinds of projects, execution speed aside.) Very hard problem space and impressive you did what you did.

    This also feels to me like a concrete example of how building a language ecosystem around a single implementation has effects later on. Not exactly a criticism of Python — we wouldn’t have had it at all but for the glorious messy pragmatic process that brought it about, and a world without Python is measurably less fun. But feels relevant looking at languages generally.

  9. What would have happened if you guys rewrote all your C extensions to CFFI?
    Wouldn’t that be less of an effort than writing your own Python runtime?

    I was really impressed by the attempt but was unsure of the motivation. I think no one in the Pyston team ever described what’s so unique about Dropbox that wouldn’t make it work with PyPy in detail.
    All the issues you detailed in this post can (and has been) improve. Even CPyExt isn’t that slow as it used to be.

    • I disagree with the premise of your first question — CFFI is a small subset of what the Python C API provides, so I don’t think it would even be possible to rewrite all of the extensions on CFFI (especially some of the critical ones). As a point of comparison, why hasn’t NumPy been ported to CFFI?

      As to why we didn’t use PyPy, what do you think about the issues I mentioned in this blog post? I’m happy to go into more detail if they’re not clear.


Cancel reply