People sometimes ask me how Pyston is going and what we're currently working on. It's a bit hard to answer, both because we haven't had a release recently with some headline-worthy features, but also because a lot of the stuff we're working on is individually pretty small. Sometimes I try to find some sort of way of expressing this, maybe saying something like "there are a lot of small optimizations that we have to include" or "there is a very long tail of compatibility work". It never feels that satisfying, so I thought I'd just jot down some of the random things that I've done lately and hope that maybe it ends up being somewhat representative.
- Single-character string optimizations. I noticed that we were running the following code somewhat slowly:
query_string = url.split('?')
It turned out that we actually did a pretty good job at most of this: we would get into url.split quickly, and we would take the result and find the 1th element in it quickly. It was just that our str.split method implementation was much slower than CPython's. In particular, we were using a string function that was string.find(string), which even though was fast and had special-casing for small strings, was not as fast as the corresponding string.find(char) function. So we needed to add an optimization that if the string that we are splitting on is a single character, we call string.find(char). (CPython also has this optimization.)
- Tracing-jit aggressiveness backoff. This is probably the most along the lines of what I thought I'd be working on: some JIT level features dealing with some cool dynamic-language properties. Cool.
- Running code inside execs quickly. Well, I haven't actually done this yet but I'm going to. Currently we bail on efficient handling of execs, since they have some special name-resolution rules [or rather they are vastly more likely to use those rules than normal Python code], so we restrict that code to the interpreter. I'm noticing that this is starting to effect us: collections.namedtuple creates your class by constructing a class definition string and exec'ing it. Even though the resulting code is small, every time we have to run through it we pay some extra cost via the not-as-fast interpreter.
- Efficient unicode attribute lookup. I didn't anticipate this at all, but there are definitely cases where it's important for us to be able to handle unicode-based attribute lookups quickly, such as getattr(obj, u"foo"). People don't often explicitly request unicode attribute names, but any code that does "from __future__ import unicode_literals" will get this behavior by default.
- Initializing sets in __new__ vs __init__. This is the kind of "long tail" compatibility issue I mentioned. You wouldn't think that it would matter to the user whether the set did its initialization work in __new__ or __init__. Sure, there are ways that the user could tell if they really wanted to, but does "real code" doesn't depend on it? Turns out the answer is yes, this causes errors in sqlalchemy. So I need to go back and make sure we do the initialization at the same time that CPython does, so that we can support sqlalchemy's use of set-subclassing.
So anyway, that's just some of the random stuff that I've been up to lately (or am about to do). There are definitely way more details to be worked out than I expected.
I've done a number of projects involving Xilinx FPGAs and CPLDs, and honestly I'm frustrated with them enough to be interested in trying out one of their competitors. This is pretty rant-y, so take it with a grain of salt but some of my gripes include:
- Simply awful toolchain support. The standard approach is to reverse-engineer the Xilinx file formats and write your own tooling on top of them.
- Update -- looks someone else posted a much lengthier blog post about this, which is a good read.
- Terrible software speed. I suppose they care much more about large design teams where the entire synthesis time will be measured in hours, but for a simple hobby project, it's pretty infuriating that a syntax error still takes a 10 second edit-compile-debug cycle. This is not due to any complexities in the language they support (as opposed to C++ templates, for example), but is just plain old software overhead on their part: it takes 5 seconds for them to determine that the input file doesn't exist. If you use their new 7-series chips, you can use their new Vivado software which may or may not be better, but rather than learn a new line and software I decided to try the competitor.
- Expensive prices. They don't seem to feel like they need to compete on price -- I'm sure they do for the large contracts, but for the "buy a single item on digikey" they seem to charge whatever the market will bear. And I was paying it, so I guess that's their prerogative, but it makes me frustrated.
So anyway, I had gone with Xilinx, the #1 (in sales I believe) FPGA vendor, since when learning FPGAs I think that makes sense: there's a lot of third-party dev boards for them, a lot of documentation, and a certain "safety in numbers" by going with the most common vendor. But now I feel ready to branch out and try the #2 vendor, Altera.
I saw a cheap little dev kit for Altera: the BeMicro CV. This is quite a bit less-featured than the Nexys 3 that I have been using, but it's also quite a bit cheaper: it's only $50. The FPGA it has is quite a bit beefier as well: it has "25,000 LEs [logic elements]", which as far as I can tell is roughly equivalent to the Xilinx Spartan-6 LX75. The two companies keep inflating the way they measure the size of their FPGAs so it's hard to be sure, and they put two totally different quantities in the sort fields in digikey (Xilinx's being more inflated), but I picked the LX75 (a $100 part) by assuming that "1 Xilinx slice = 2 Altera LEs", and the Cyclone V on this board has 25k LEs, and the LX75 has 11k slices.
My first experience with Altera was downloading and installing the software. They seem to have put some thought into this and have broken the download into multiple parts so that you can pick and choose what you want to download based on the features you want -- a feature that sounds trivial but is nice when Xilinx just offers a monolithic 6GB download. I had some issue right off the bat though: the device file was judged to be invalid by the installer, so when I start up Quartus (their software), it tells me there are no devices installed. No problemo, let's try to reinstall it -- "devices already installed" it smugly informs me. Luckily the uninstaller lets you install specific components, so I was able to remove the half-installed device support, but since the software quality was supposed to be one of their main selling points, this was an ominous beginning.
Once I got that out of the way, I was actually pretty impressed with their software. Their "minimum synthesis time" isn't much different from Xilinx's, which I find pretty annoying, and it also takes them a while to spot syntax errors. So unfortunately that gripe isn't fully satisfied. Overall the software feels snappier though -- it doesn't take forever to load the pin planner or any other window or view. There's still an annoying separation between the synthesis and programming flows -- the tools know exactly what file I just generated, but I have to figure out what it was so that I can tell the programmer what file to program. And then the programmer asks me even time if I would like to save my chain settings.
The documentation seems a bit lighter for Altera projects, especially with this dev board -- I guess that's one drawback of not buying from Digilent. Happily and surprisingly, the software was intuitive enough that I was able to take a project all the way through synthesis without reading any documentation! While it's not perfect, I can definitely see why people say that Altera's software is better. I had some issues with the programmer where the USB driver hadn't installed, so I ended up having to search on how to do that, but once I got that set up I got my little test program on the board without any trouble.
So at this point, I have a simple test design that connects some of the switches to some of the LEDs. Cool! I got this up way faster than I did for my first FPGA board; that's not really a comparison of the two vendors since there's probably a large experience component, but it's still cool to see. Next I'll try to find some time to do a project on this new board -- this FPGA is quite a bit bigger than my previous one, so it could possibly fit a reasonable Litecoin miner.
Overall it's hard to not feel like the FPGA EDA tools are far behind where software build tools are. I guess it's a much smaller market, but I hope that some day EDA tools catch up.
I remarked to a friend recently that technology seems to increase our expectations faster than it can meet them: "why can't my pocket-computer get more than 6 hours of battery life" would have seemed like such a surreal complaint 10 years ago. For that reason I want to recognize an experience I had lately that actually did impress me even in our jaded ways.
The background is that I wanted a dedicated laptop for my electronics work. Normally I use my primary laptop for the job, but it's annoying to connect and disconnect it (power, ethernet [the wifi in my apartment is not great], mouse, electronics projects), and worries about lead contamination lead me to be diligent about cleaning it after using it for electronics. So, I decided to dust off my old college laptop and resurrect it for this new purpose.
I didn't have high hopes for this process, since now my college laptop is not just "crappy and cheap" (hey I bought it in college) but also "ancient"! But anyway I still wanted to try it, so I pulled out my old laptop, plugged it in... and was immediately shown the exact screen I had left three years ago. Apparently the last day I used it was May 1 2011, and I had put it into hibernation. Everything worked after all these years! This thing had been banged around like crazy during college, and sat around for a few years afterwards, and yet it still worked. I'm pretty happy when a piece of electronics lives through its 3 year warranty, but this thing was still going strong after 7 years -- crazy.
I was generally impressed by the laptop too -- this is comparing by 7-year-old college laptop with my 3-year-old current one. The screen was a crisp 1920x1200 (quite a bit better than my new laptop), and it didn't feel sluggish at all. I checked out the processor info and some online benchmarks, and it looks like the processor was only ~10% slower than my new one. Of course, not everything was great: the old laptop feels like it is definitely over 6lbs, and I can't believe I lugged that around campus. But it's just going to sit on a desk now so it doesn't matter.
Part 2: Ubuntu
This laptop was running 10.04, which I remember being a major pain to get running at the time. I decided to upgrade it to 14.04, but I was worried about this process as well. I had spent several days getting Linux to work on this laptop when I first decided to switch to it, which involved some crazy driver work from some friends to get the wifi card working. I was worried that I would run into the same problems and have to give up on this.
So, first I tried an in-place Ubuntu upgrade to 14.04, and to my surprise everything worked! I wanted a clean slate, though, so I tried a fresh install of 14.04: again, everything worked. I haven't done an extensive run through the peripherals but all the necessary bits were certainly working.
I know that it's probably just a single driver that got added to the Linux kernel, but the experience was night-and-day compared to the headache I endured the first time.
So anyway, this was crazy! I have always panned Dell and my old laptop as being "crappy", and Linux as "not user friendly", but at least in this particular case the hardware proved to be remarkably robust (let's just ignore the bezel that came loose), and the software remarkably smooth.
Part 3: Weird desktop
Freshly bolstered by this experience, and with a 14.04 CD in hand, I decided to upgrade my work desktop as well. I had for some reason decided to install 11.04 on that machine, which has been causing me no end of pain recently. This Ubuntu release is so unsupported that all the apt mirrors are gone, and the only supported upgrade path is a clean install. (Side note: because of this experience, I've decided to never use a non-LTS release again.) I've put off reinstalling it with a new version since I also had a horrible experience getting it up and running: I'm running a three-monitor setup and it took me forever (a few days of work) to figure out the right combination of drivers and configurations.
This one didn't go quite as smoothly with this transition, but within a day I was able to get 14.04 up and running and everything pretty much back to the way it was before, but minus the random memory corruptions I used to get from a buggy graphics driver! I also no longer get warnings from every web app out there that I am running an ancient version of Chrome.
All in all, I've been extremely impressed with the reliability of the electronics hardware and the comprehensiveness of modern Linux / Ubuntu.
Part 4: Using the new setup
While this post is mostly about how easy it apparently has become to get Ubuntu running on various hardware, I'm also extremely happy with the new electronics setup of having a dedicated laptop. It is definitely nice to not have to swap my main laptop in and out, and it also means that I can do the software side of my electronics work from anywhere. I set up a SSH server on this laptop, and I am able to log in remotely (even outside of my apartment) into it and work with any electronics projects I left attached! (I plan to point my Dropcam at the workbench so that I can see things remotely, though I haven't gotten around to that.) I made use of this ability over the Thanksgiving break to work on an FPGA design (got DDR3 ram working with it!), which I will hopefully have time to blog about shortly.
Overall, I'm definitely glad I decided to go through this process: the dedicated laptop is very helpful and getting it set up was way less painful than I expected.
I was excited to see recently that ARM announced their new Cortex-M7 microcontroller core, and that ST announced their line using that core, the STM32F7. I had briefly played around with the STM32 before, and I talked about how I was going to start using it -- I never followed up on that post, but I got some example programs working, built a custom board, didn't get that to work immediately, and then got side-tracked by other projects. With the release of the Cortex M7 and the STM32F7, I thought it'd be a good time to get back into it and work through some of the issues I had been running into.
First of all though, why do I find these chips exciting? Because they present a tremendous value opportunity, with a range of competitive chips from extremely low-priced options to extremely powerful options.
The comparison point here is the ATmega328: the microcontroller used on the Arduino, and what I've been using in most of my projects. They currently cost $3.28 [all prices are for single quantities on digikey], for which you get a nice 20MHz 8-bit microcontroller with 32KB of flash and 2KB of ram. You can go cheaper by getting the ATmega48 which costs $2.54, but you only get 4KB of program space and 512B of ram, which can start to be limiting. There aren't any higher-performance options in this line, though I believe that Atmel makes some other lines (AVR32) that could potentially satisfy that, and they also make their own line of ARM-based chips. I won't try to evaluate those other lines, though, since I'm not familiar with them and they don't have the stature of the ATmegas.
Side note -- so far I'm talking about CPU core, clock speeds, flash and ram, since for my purposes those are the major differentiators. There are other factors that can be important for other projects -- peripheral support, the number of GPIOs, power usage -- but for all of those factors, all of these chips are far far more than adequate for me so I don't typically think about them.
The STM32 line has quite a few entries in it, which challenge the ATmega328 on multiple sides. On the low side, there's the F0 series: for $1.58, you can get a 48MHz 32-bit microcontroller (Cortex M0) with 32KB of flash and 4KB of RAM. This seems like a pretty direct competitor to the ATmega328: get your ATmega power (and more) at less than half the price. It even comes in the same package, for what that's worth.
At slightly more than the cost of an ATmega, you can move up to the F3 family, and get quite a bit better performance. For $4.14 you can get a 72MHz Cortex M3 with 64KB of flash and 16KB of RAM.
One of the most exciting things to me is just how much higher we can keep going: you can get a 100MHz chip for $7.08, a 120MHz chip for $8.26, a 168MHz chip for $10.99, and -- if you really want it -- a 180MHz chip for $17.33. The STM32F7 has recently been announced and there's no pricing, but is supposed to be 200MHz (with a faster core than the M4) and is yet another step up.
When I saw this, I was pretty swayed: assuming that the chips are at least somewhat compatible (but who knows -- read on), if you learn about this line, you can get access to a huge number of chips that you can start using in many different situations.
But if these chips are so great, why doesn't everyone already use them? As I dig into trying to use it myself, I think I'm starting to learn why. I think some of it has to do with the technical features of these chips, but it's mostly due to the ecosystem around them, or lack thereof.
Working with the STM32 and the STM32F3 Discovery board I have (their eval board), I'm gaining a lot of appreciation for what Arduino has done. In the past I've haven't been too impressed -- it seems like every hobbyist puts together their own clone, so it can't be too hard, right?
So yes, maybe putting together the hardware for such a board isn't too bad. But I already have working hardware for my STM32, and I *still* had to do quite a bit of work to get anything running on it. This has shown me that there is much more to making these platforms successful than just getting the hardware to work.
The Arduino takes some fairly simple technology (ATmega) and turns it into a very good product: something very versatile and easy to use. There doesn't seem to be anything corresponding for the STM32: the technology is all there, and probably better than the ATmega technology, but the products are intensely lacking.
Ok so I've been pretty vague about saying it's harder to use, so what actually causes that?
Family compatibility issues
One of the most interesting aspects of the STM32 family is its extensiveness; it's very compelling to think that you can switch up and down this line, either within a project or for different projects, with relatively little migration cost. It's exciting to think that with one ramp-up cost, you gain access to both $1.58 microcontrollers and 168MHz microcontrollers.
I've found this to actually be fairly lackluster in practice -- quite a bit changes as you move between the different major lines (ex: F3 vs F4). Within a single line, things seem to be pretty compatible -- it looks like everything in the "F30X" family is code-compatible. It also looks like they've tried hard to maintain pin-compatibility for different footprints between different lines, so it looks like (at a hardware level) you can take an existing piece of hardware and simply put a different microcontroller onto it. I've learned the hard way that pin compatibility in no way has to imply software compatibility -- I thought pin compatibility would have been a stricter criteria than software compatibility, but they're just not related.
To be fair, even the ATmegas aren't perfect when it comes to compatibility. I've gotten bitten by the fact that even though the ATmega88 and ATmega328 are supposed to be simple variations on the same part (they have only a single datasheet), there some differences there. There's also probably much more of a difference between the ATmegaX8 and the other ATmegas, and even more of a difference with their other lines (XMEGA, AVR32).
For the ATmegas, people seem to have somewhat standardized on the ATmegaX8, which keeps things simple. For the STM32, people seem to be pretty split between the different lines, which leads to a large amount of incompatible projects out there. Even if you're just trying to focus on a single chip, the family incompatibilities can hurt you even if you're not trying to port code -- it means that the STM32 "community" ends up being fragmented more than it potentially could be, with lots of incompatible example code out there. It means the community for any particular chip is essentially smaller due to the fragmentation.
What exactly is different between lines? Pretty much all the registers can be different, the interactions with the core architecture can be different (peripherals are put on different buses, etc). This means that either 1) you have different code for different families, or 2) you use a compatibility library that masks the differences. #1 seems to be the common case at least for small projects, and mostly works but it makes porting hard, and it can be hard to find example code for your particular processor. Option #2 (using a library) presents its own set of issues.
Lack of good firmware libraries
This issue of software differences seems like the kind of problem that a layer of abstraction could solve. Arduino has done a great job of doing this with their set of standardized libraries -- I think the interfaces even get copied to unrelated projects that want to provide "Arduino-compatibility".
For the STM32, there is an interesting situation: there are too many library options. None of them are great, presumably because none of them have gained enough traction to have a sustainable community. ST themselves provide some libraries, but there are a number of issues (licensing, general usability) and people don't seem to use it. I have tried libopencm3, and it seems quite good, but it has been defunct for a year or so. There are a number of other libraries such as libmaple, but none of them seem to be taking off.
Interestingly, this doesn't seem to be a problem for more complex chips, such as the Allwinner Cortex-A's I have been playing with -- despite the fact that they are far more complicated, people have standardized on a single "firmware library" called Linux, so we don't have this same fragmentation.
So what did I do about this problem of there being too many options leading to none of them being good? Decide to create my own, of course. I don't expect mine homebrew version to take off or be competitive with existing libraries (even the defunct ones), but it should be educational and hopefully rewarding. If you have any tips about other libraries I would love to hear them.
Down the rabbit hole...
Complexity of minimal usage
I managed to get some simple examples working on my own framework, but it was surprisingly complicated (and hence that's all I've managed to do so far). I won't go into all the details -- you can check out the code in my github -- but there are quite a few things to get right, most of which are not well advertised. I ended up using some of the startup code from the STM32 example projects, but I ended up running into a bug in the linker script (yes you read that right) which was causing things to crash due to an improper setting of the initial stack pointer. I had to set up and learn to use GDB to remotely debug the STM32 -- immensely useful, but much harder than what you need to do for an Arduino. The bug in the linker script was because it had hardcoded the stack pointer as 64KB into the sram, but the chip I'm using only has 40KB of sram; this was an easy fix, so I don't know why they hardcoded that, especially since it was in the "generic" part of the linker script. I was really hoping to avoid having to mess with linker scripts to get an LED to blink.
Once I fixed that bug, I got the LEDs to blink and was happy. I was messing with the code and having it blink in different patterns, and noticed that sometimes it "didn't work" -- the LEDS wouldn't flash at all. The changes that caused it seemed entirely unrelated -- I would change the number of initial flashes, and suddenly get no flashes at all.
It seems like the issue is that I needed to add a delay between the enabling of the GPIO port (and the enabling of the corresponding clock) and the setting of the mode registers that control that port. Otherwise, the mode register would get re-reset, causing all the pins get set back to inputs instead of outputs. I guess this is the kind of issues that one runs into when working at this level on a chip of this complexity.
So overall, the STM32 chips are way, way more complicated to use than the ATmegas. I was able to build custom ATmega circuits and boards very easily and switch away from the Arduino libraries and IDE without too much hassle, but I'm still struggling to do that with the STM32 despite having spent more time and now having more experience on the subject. I really hope that someone will come along and clean up this situation, since I think the chips look great. ST seems like they are trying to offer more libraries and software, but I just don't get an optimistic sense from looking at it.
So, I'm back where I was a few months ago: I got some LEDs to blink on an evaluation board. Except now it's running on my own framework (or lack thereof), and I have a far better understanding of how it all works.
The next steps are to move this setup to my custom board, which uses a slightly different microcontroller (F4 instead of F3) and get those LEDs to blink. Then I want to learn how to use the USB driver, and use that to implement a USB-based virtual serial port. The whole goal of this exercise is to get the 168MHz chip working and use that as a replacement for my arduino-like microcontroller that runs my other projects, which ends up getting both CPU and bandwidth limited.
We've been working very hard over the past few months, and I'm very proud to "release" version 0.2. I set up a shiny new dedicated Pyston blog, and you can see the announcement here: http://blog.pyston.org/2014/09/11/9/
I'm putting "release" in quotes since we're not distributing binaries due to the "early access" nature, and in fact the v0.2 tag in the repository is already out of date and there are a number of features that have landed on trunk. But still, I think it's a milestone deserving of a version number bump.
Sometimes I start a project thinking it will be about one thing: I thought my FPGA project was going to be about developing my Verilog skills and building a graphics engine, but at least at first, it was primarily about getting JTAG working. (Programming Xilinx FPGAs is actually a remarkably complicated story, typically involving people reverse engineering the Xilinx file formats and JTAG protocol.) I thought my 3D printer would be about designing 3D models and then making them in real life -- but it was really about mechanical reliability. My latest project, which I haven't blogged about since I was trying to hold off until it was done, is building a single board computer (pcb photo here) -- I thought it'd be about the integrity of high-speed signals (DDR3, 100Mbps ethernet), but it's actually turned out to be about BGA soldering.
I've done some BGA soldering in the past -- I created a little test board for Xilinx CPLDs, since those are 1) the cheapest BGA parts I could find, and 2) have a nice JTAG interface which gives us an easy way of testing the external connectivity. After a couple rough starts with that I thought I had the hang of it down, so I used a BGA FPGA in my (ongoing) raytracer project. I haven't extensively tested the soldering on that board, but the basic functionality (JTAG and VGA) were brought up successfully, so for at least ~30 of the pins I had a 100% success rate. So I thought I had successfully conquered BGA soldering, and I was starting to think about whether or not I could do 0.8mm BGAs, and so on.
My own SBC
Fast forward to trying to build my own single board computer (SBC). This is something I've been thinking about doing for a while -- not because I think the world needs another Raspberry-Pi clone, but because I want to make one as small as possible and socket it into a backplane for a small cluster computer. Here's what I came up with:
Sorry for the lack of reference scale, but these boards are 56x70mm, and I should be able to fit 16 of them into a mini-ITX case. The large QFP footprint is for an Allwinner A13 processor -- not the most performant option out there, but widely used so I figured it'd be a good starting point. The assembly went fairly smoothly: I had to do a tiny bit of trace cutting and added a discrete 10k resistor, and I forgot to solder the exposed pad of the A13 (which is not just for thermal management, but is also the only ground pin for the processor), but after that, it booted up and I got a console!
The console was able to tell me that there was some problem initializing the DDR3 DRAM, at which point the processor would freeze. I spent some time hacking around in the U-Boot firmware to figure out what was going wrong, and the problems started with the processor failing in "training", or learning of optimal timings. I spent some time investigating that, and wasn't able to get it to work.
So I bought an Olimex A13 board, and decided to try out my brand of memory on it, since it's not specified to be supported. I used my hot air tool to remove the DDR3 chip from the Olimex board and attach one of mine, and... got the same problem. I was actually pretty happy with that, since it meant that there was a problem with the soldering or the DRAM part, which is much more tractable than a problem with trace length matching or single integrity.
I tried quite a few times to solder the DRAM onto the Olimex board, using a number of different approaches (no flux, flux, or solder paste). In the end, on the fifth attempt, I got the Olimex board to boot! So the memory was supported, but my "process yield" was abysmal. I didn't care, and I decided to try it again on my board, with no luck. So I went back to the Olimex board: another attempt, didn't work. Then I noticed that my hot air tool was now outputting only 220C air, which isn't really hot enough to do BGA reflow. (I left a 1-star review on Amazon -- my hopes weren't high for that unit, but 10-reflows-before-breaking was not good enough.)
I ordered myself a nicer hot air unit (along with some extra heating elements for the current one in case I can repair it, but it's not clear that the heating element is the issue), which should arrive in the next few days. I'm still holding out hope that I can get my process to be very reliable, and that there aren't other problems with the board. Hopefully my next blog post will be about how much nicer my new hot air tool is, and how it let me nail the process down.
I've seen a lot of references to the wearables market lately with a lot of people getting very excited about it. I can't tell though, is it actually a thing that people will really want? Lots of companies are jumping into it and trying to provide offerings, and the media seems to be taking it seriously, but even though I work at a tech company in San Francisco, I haven't seen a single person wearing one or talking about it.
I can see why companies are jumping into it: a lot of them got burned by not taking tablets seriously, and look where that market ended up now. A potential new market, which could provide a new revenue stream, has to be the dream for any exec, and it could make a lot of sense to get a jump start on a new market even if there are doubts about it.
That said, I'm feeling like wearables might be a similar market to 3d printers: it makes a lot of sense that in the future those things will be very big, but I think there's a very long road ahead. I'm not sure there's going to be a single killer feature for either of them, so adoption could be slow -- though I think once they take off they'll get integrated into our day-to-day.
But who knows, I was a tablet naysayer when they came out, and maybe Apple will release an iWatch which will define and launch the wearables market as well. But especially when it comes to the "smart watch" wearable, I think it will be more similar to netbooks, and even though a number of companies will push hard, people will gravitate to other form factors.
http://www.wired.com/2014/08/isp-bitcoin-theft/ Looks like this is an implementation of what I described previously. This guy used BGP to route internet traffic to him -- the article is light on the technical details but my guess is that he masqueraded as a popular bitcoin pool and gave out orders that benefited him rather than the real pool.
The problem is that while the base Bitcoin protocol is secure (as far as I know), there are huge ecosystems built on top of it, most of which haven't had the same scrutiny. The worst I've seen is the "stratum mining protocol": it distributes the mining work well, but I don't think anyone has paid any attention to its security. There isn't any authentication of either endpoint: you don't really need to authenticate the client except for potentially rate limiting issues, but there's *no authentication of the server*. This means that if anyone is able to hijack your connection to the mining pool, they can ask you to start mining for them, and you can't detect it until the pool pays you less money than you expected.
I was anticipating this happening with a DNS spoofing attack, but this particular article is about BGP. Doing a MITM of an unencrypted and unauthenticated stream is a very basic level of attack capabilities, and there are a number of different vectors to do it. The Wired article blames BGP, which I think is the wrong conclusion. It's up to the pool operators and mining-client-writers to come up with some sort of authentication scheme, and then get everyone to switch to it. Until then, it seems well within the NSA's means to hijack all of the largest pools and take over the bitcoin blockchain if they wanted to.
I've seen the Mill CPU come up a number of times -- maybe because I subscribed to their updates and so I get emails about their talks. They're getting a bunch of buzz, but every time I look at their docs or watch their videos, I can't tell -- are they "for real"? They certainly claim a large number of benefits (retire 30 instructions a cycle! expose massive ILP!), but it's hard to tell if it's just some guy claiming things or if there's any chance this could happen.
They make a big deal out of their founder's history: "Ivan Godard has designed, implemented or led the teams for 11 compilers for a variety of languages and targets, an operating system, an object-oriented database, and four instruction set architectures." At first I thought this was impressive, but I decided to look into it and I can't find any details about what he's done, which isn't a good sign. If we're counting toy projects here, I've defined 5 languages, an ISA, and an OS -- which is why we don't usually count toy projects.
They revealed in one of their talks too that they don't have anything more than a proof-of-concept compiler for their system... but they have "50-odd" patents pending? They said it's "fairly straightforward to see" the results you'd get "if you're familiar with compilers", and when more hard questions were asked Ivan started talking about his credentials. I feel less convinced...
This sounds like a lot of stuff that's been attempted before (ex Itanium) -- unsuccessfully. They have some interesting ideas, but no compiler, and (if I remember correctly) no prototype processor. It bugs me too when people over-promise: Ivan talks about what they "do" rather than "plan to do" or "want to do", or "have talked about doing", which feels disingenuous if it's just a paper design right now.
The more I look into the Mill the more I don't think it's real; I think it'll fizzle out soon, as more people push for actual results rather than claims. It's a shame, since I think it's always cool to see new processors with new designs, but I don't think this will end up being one of them.
There's a cool-looking competition being held right now, called The Hackaday Prize. I originally tried to do this super-ambitious custom-SBC project -- there's no writeup yet but you can see some photos of the pcbs here -- but it's looking like that's difficult enough that it's not going to happen in time. So instead I've decided to finally get around to building something I've wanted to for a while: an FPGA raytracer.
I've been excited for a while about the possibility of using an FPGA as a low-level graphics card, suitable for interfacing with embedded projects: I often have projects where I want more output than an LCD display, but I don't like the idea of having to sluff the data back to the PC to display (defeats the purpose of it being embedded). I thought for a while about doing either a 2D renderer or even a 3D renderer (of the typical rasterizing variety), but those would both be a fair amount of work for something that people already have. Why not spend that time and do something a little bit different? And so the idea was born to make it a raytracer instead.
I'm not sure how well this is going to work out; even a modest resolution of 640x480@10fps is 3M pixels per second. This isn't too high in itself, but with a straightforward implementation of raytracing, even rendering 1000 triangles with no lighting at this resolution would require doing three *billion* ray-triangle intersections per second. Even if we cut the pixel rate by a factor of 8 (320x240@5fps), that's still 380M ray-triangle intersections. We would need 8 intersection cores running at 50MHz, or maybe 16 intersection cores at 25MHz. That seems like a fairly aggressive goal: it's probably doable, but it's only 320x240@5fps, which isn't too impressive. But who knows, maybe I'll be way off and it'll be possible to fit 64 intersection cores in there at 50MHz! The problem is also very parallelizable, so in theory the rendering performance could be improved pretty simply by moving to a larger FPGA. I'm thinking of trying out the new Artix-series of FPGAs: they have a better price-per-logic-element than the Spartans and are supposed to be faster. Plus there are some software licensing issues with trying to use larger Spartans that don't exist for the Artix's. I'm currently using an Spartan 6 LX16, and maybe eventually I'll try using an Artix 7 100T, which has 6 times the potential rendering capacity.
These calculations assume that we need to do intersections with all the triangles, which I doubt anyone serious about raytracing does: I could try to implement octtrees in the FPGA to reduce the number of collision tests required. But then you get a lot more code complexity, as well the problem of harder data parallelism (different rays will need to be intersected with different triangles). There's the potential for a massive decrease in the number of ray-triangle intersections required (a few orders of magnitude), so it's probably worth it if I can get it to work.
Part of the Hackaday Prize is that they're promoting their new website, hackaday.io. I'm not quite sure how to describe it -- maybe as a "project-display website", where project-doers can talk and post about their projects, and get comments and "skulls" (similar to Likes) from people looking at it. It seems like an interesting idea, but I'm not quite sure what to make of it, and how to split posts between this blog and the hackaday.io project page. I'm thinking that it could be an interesting place to post project-level updates there (ex: "got the dram working", "achieved this framerate", etc) which don't feel quite right for this, my personal blog.
Anyway, you can see the first "project log" here, which just talks about some of the technical details of the project and has a picture of the test pattern it produces to validate the VGA output. Hopefully soon I'll have more exciting posts about the actual raytracer implementation. And I'm still holding out for the SBC project I was working on so hopefully you'll see more about that too :P