kmod's blog


Bitcoin vulnerability exploited  Looks like this is an implementation of what I described previously.  This guy used BGP to route internet traffic to him -- the article is light on the technical details but my guess is that he masqueraded as a popular bitcoin pool and gave out orders that benefited him rather than the real pool.

The problem is that while the base Bitcoin protocol is secure (as far as I know), there are huge ecosystems built on top of it, most of which haven't had the same scrutiny.  The worst I've seen is the "stratum mining protocol": it distributes the mining work well, but I don't think anyone has paid any attention to its security.  There isn't any authentication of either endpoint: you don't really need to authenticate the client except for potentially rate limiting issues, but there's *no authentication of the server*.  This means that if anyone is able to hijack your connection to the mining pool, they can ask you to start mining for them, and you can't detect it until the pool pays you less money than you expected.

I was anticipating this happening with a DNS spoofing attack, but this particular article is about BGP.  Doing a MITM of an unencrypted and unauthenticated stream is a very basic level of attack capabilities, and there are a number of different vectors to do it.  The Wired article blames BGP, which I think is the wrong conclusion.  It's up to the pool operators and mining-client-writers to come up with some sort of authentication scheme, and then get everyone to switch to it.  Until then, it seems well within the NSA's means to hijack all of the largest pools and take over the bitcoin blockchain if they wanted to.

Filed under: bitcoin 1 Comment

Bitcoin vulnerability

Today seems like a pretty good day to rag on bitcoin, so I thought I'd post about something I've been thinking about ever since I experimented with writing my own miner.  Since the goal of that project was educational, I went through it in a way that most other people probably don't: I built a complete system (everything from the miner to the network interfaces) from scratch (no bitcoin-related libraries), which I think gave me some visibility into parts of Bitcoin that not many people see.

And what I discovered is how trivially hackable the system is right now.  People talk about the security of the underlying crypto that's used, but any security person worth their salt knows that the security of an entire system is way more than the sum of the individual components.  And in this case, the vulnerability I saw has to do with how people use Bitcoin and how the Bitcoin ecosystem is not secure.

To give a sense of the level of exploitability, I think the attack could be pulled off by a motivated individual, provided they are willing to let it be detectable+traceable.  Defeating the traceability is most likely within the means of any reasonable government or cybercriminal organization, and making the attack undetectable is beyond my understanding but seems certainly within the means of a large government.  I'm not familiar enough with Bitcoin to know what the most valuable targets are, but the attack I have in mind can give a varying amount of control over the blockchain, which I assume is the holy grail.

The easiness of the attack boils down to the fact that most of the Bitcoin ecosystem players don't use basic internet best practices, potentially due to an assumption that they are unnecessary because Bitcoin is "inherently secure".  Implementing something to be part of the ecosystem made it very clear that certain aspects were vulnerable, which makes me think that people haven't paid too much attention to how everything is fitting together.  Regardless, even if all ecosystem players adopted best practices, the security of the Bitcoin system would still rely on internet-level security measures, and thus is probably tamperable by the NSA.

The weakness stems from the fact that even though the underlying Bitcoin protocol is decentralized, the ecosystem services around it very much are not.  I don't think this is a solvable problem within the Bitcoin ecosystem, since these services are naturally more valuable the more centralized they are, and thus everyone's incentive is to contribute to the centralization.  It's my belief that for a cryptocurrency to be truly secure, it has to be designed with the entire ecosystem in mind; I don't think bolting on the necessary services is likely to be successful.


Edit [8/7/14]: someone did exactly this:  The idea behind my post is that mining pooling schemes have pretty laughable network security.  Most of the traffic goes over HTTP -- what you really need is certificate pinning so you make sure you are interacting with who you think you are.  There's no sensitive data being transmitted so the encryption aspect of HTTPS isn't relevant here, but if you don't verify who you're talking to, it's pretty easy to masquerade as the mining pool host and have people mine for you instead.  I was anticipating a DNS hijack, but this guy used BGP.

Filed under: bitcoin No Comments

Aside: ASIC conversion

The current state of the Bitcoin mining world seems to revolve around new ASIC-based miners that are coming out, such as from Butterfly Labs.  These devices seem to be very profitable investments if you can get your hands on one -- this calculator says that the $2,499 50GH/s machine should pay itself off itself off in 35 days.  This made me start thinking that with such high margins for the end-user, the manufacturing costs must be low enough that even at a multiple of them, it should be possible to do this yourself in a way that could be close enough to profitable that the educational value justifies the cost.

So, out of curiosity, I decided to look into how feasible it would be to produce actual ASICs.  From doing some google searching, people seem to say that it starts at multiple hundreds of thousands of dollars, though other people say that it can be cheaper using "multi-wafer fabrication".

Multi-wafer fabrication

Multi-wafer fabrication is where an intermediary company collects orders from smaller customers, and batches them into a single order for the foundry.  My friend pointed me to, which offers MWF and has an automated quote system, so I asked for a quote for their cheapest process, GlobalFoundries 0.35um CMOS.  The results were pretty surprising:

  • You order in lots of 50 chips
  • Each lot costs $320 per mm^2, with a minimum size of 0.06mm^2 ($20 total!) and maximum of 9mm^2.
    • For the other processes that I checked, additional lots are significantly cheaper than the first one
  • Your packaging options are either $3000 for all lots for a plastic QFN/QFP package, or $30-$70 per chip for other types

So the absolute minimum cost seems to be $50, if you want a single 250um-by-250um chip in the cheapest package (a ceramic DIP28).  You probably want a few copies, so let's make that about $100 -- this is cheap enough that I would do it even if it serves no practical purpose.

Die size estimation

The huge question, of course, is what can you actually get with a 0.06mm^2 chip?  I tried to do a back-of-the-envelope calculation:

  • Xilinx claims that each Logic Cell is worth 15 "ASIC Gates"; they only say this for their 7-series fpgas, which may have different cells than my Spartan 6, and this is their marketing material so it can only be an overestimate, but both of these factors will lead to a more conservative estimate so I'll accept their 15 number
  • The Spartan 6 LX16 has 14,579 logic cells (again, I'm not sure why they decided to call it the "16"); let's assume that I'm fully utilizing all of them as 15 ASIC gates, giving 218,685 gates I need to fit on the ASIC.
  • This page has some info on how to estimate the size of an asic based on the process and design:
    • For a 3 metal-layer, 0.35um process, the "Standard-cell density" is approximately 16k gates per mm^2
    • The "Gate-array utilization" is 80-90%, ie the amount of the underlying standard cells that you use
    • The "Routing factor" (ie 1 + routing_overhead) is between 1.0 and 2.0
    • This gives an effective gate density of between 6k and 14k gates per mm^2... much less than I thought.

So if we're optimistic and say that we'll get the 14k gates/mm^2, and that my design actually requires fewer than 218k gates, it's possible that my current 5MH/s circuit could fit in this process.  There are many other processes available that I'm sure get much higher gate densities -- for example, this thread says that a TSMC 0.18um, 7LM (7 layer metal) process gets ~109k gates/mm^2, and using the InCyte Chip Estimator Starter Edition says that a 200k-gate design will take roughly 4mm^2 on an "Industry Average" 8LM 0.13um process.


So if I wanted to translate my current design, I'm looking at a minimum initial cost of $3,000; I'm sure this is tiny compared to a commercial ASIC, but for a "let's just see if I can do it" project, it's pretty steep.

On the other end of the spectrum, what if I'm just interested in profitability as a bitcoin miner?  Let's say that I get the DIP28 packages and I can somehow use all 50; this brings the price up to $4,500.  To determine how much hashing power I'd need to recoup that cost, I turned to the bitcoin calculator again; I gave it a "profitability decline per year" of 0.01, meaning that in one year the machine will produce only 1% as much money, which I hope is sufficiently conservative.  Ignoring power costs, the calculator says I'll eventually earn one dollar for every 9MH/s or so of computational power: assuming I'm able to optimize my design up to 10MH/s, getting 500MH/s from 50 chips is only worth $50 or so.  I'm starting to think something is very wrong here: either I can get a vastly more powerful ASIC to fit in this size, or more likely, these small prototyping batches will never be cost-competitive with volume-produced ASICs.

So, just for fun, let's look at the high end: let's create a 5mm x 5mm (max size) chip using TSMC's 65nm process, and order 2 lots of 100.  Chip Estimator says that we could get maybe 7.2M gates on this thing, so getting 200 of these chips provides about 150x more power than 50 200k chips.  The quote, however, is for $200k, so to break even I'd need to get 2TH/s from these chips, or 10GH/s per chip; with space for 150 of my current hashing cores, I'd need to get 65MH/s per core, which is far beyond where I think I can push it.

To try to get a sense of how much of the discrepancy is because I can get more power per gate vs how much is because of prototyping costs, let's just look at the cost for the second lot of that order: $12k.  This means each chip costs $150 once you include packaging, so I would have to get 1.5GH/s out of it, or 10MH/s per core, which is only twice as much as I'm currently getting.  The 10x price difference between the first and second lots makes it definitely seem like the key factor is how much volume you can get.

That said, if I wanted to create a single hashing-core chip for fun, it looks like I could get a couple of those for under $1,000.

Other costs

One big cost that's unknown to me is the cost of the design software you need to design an ASIC in the first place.  I assume that this is in the $10,000+ range, which again is out of my price range, though the silver lining is that you "only" have to pay this cost once.  Another cost that I haven't mentioned is the cost of the board to actually get this running; if I'm optimizing for cost, though, I think getting a simple, low-pin-count package (like the DIP28) shouldn't be too costly to build a board for.


My overall take from this is that the minimum cost for a custom ASIC is extremely low ($100), but making anything of a reasonable size is still going to start you off over $10,000.

Filed under: bitcoin, fpga No Comments

FPGA Bitcoin Miner: Improvements

In my last post, I talked about how I did a basic conversion of my bitcoin mining script into verilog for an fpga.  The next thing to do, of course, was to increase the mining speed.  But first, a little more detail about how the miner works:

Overview of a Bitcoin miner

The whole Bitcoin mining system is a proof of work protocol, where miners have to find a sufficiently-hard-to-discover result in order to produce a new block in the blockchain, and the quality of the result is easy to verify.  In other words, in order to earn bitcoins as a miner, you have to produce a "good enough" result to a hard problem, and show it to the rest of the world; the benefits of the bitcoin design are that 1) the result automatically encodes who produced it, and 2) despite being hard to generate, successful results are easy to verify by other bitcoin members.  For bitcoin specifically, the problem to solve is finding some data that hashes to a "very low" hash value, where the hash is a double-SHA256 hash over some common data (blockchain info, transaction info, etc) with some choosable data (timestamp, nonce).  So the way mining works, is you iterate over your changeable data, calculate the resulting block hash, and if it's low enough, submit it to the network.  Things get a bit more complicated when you mine for a profit-sharing mining pool, as you all but the largest participants have to, but the fundamental algorithm and amount of computation stays the same.

SHA256 is a chunk-based algorithm: the chunk step takes 256 bits of initial state, 512 bits of input data, and "compresses" this to 256 bits of output data.  SHA256 uses this to construct a hash function over arbitrary-length data by splitting the input into 512-bit blocks, and feeding the output of one chunk as the initial state for the next chunk, and taking the final output as the top-level hash.  For Bitcoin mining, the input data to be hashed is 80 bytes, or 640 bits, which means that two chunk iterations are required; the output of this first sha256 calculation is 256 bits, so hashing it again requires only a single chunk step.  An early optimization you can make is that the nonce falls in the second chunk of the first hash, which means that when iterating over all nonces, the input to the first of the three chunk iterations is constant.  So the way my miner works is the PC communicates with my mining pool (I'm using BTC Guild), parses the results into the raw bits to be hashed, calculates the 256-bit output of the first chunk, and passes off the results to the fpga which will iterate over all nonces and compute the second two hashes.  When the fpga finds a successful nonce, ie one that produces a hash with 32 initial bits, it sends the nonce plus the computed hash back to the pc, which submits it to BTC Guild.

The fundamental module in the FPGA is the sha256 "chunker", which implements a single chunk iteration.  The chunk algorithm has a basic notion of 64 "rounds" of shuffling internal state based on the input data, and my chunker module calculates one round per clock cycle, meaning that it can calculate one chunk hash per 64 cycles.  I stick two of these chunkers together into a "core", which takes the input from the pc, a nonce from a control unit, and outputs the hash.  I could have chosen to have each core consist of a single chunker, and require each double-hash computation to require two 64-cycle chunk rounds, but instead I put two chunkers per core and put registers between them so that they can both work at the same time, giving the core a throughput of one double-hash per 64 cycles.  Since the core is pipelined, to keep the control unit simple the core will re-output the input that corresponds to the hash it is outputting.

As I mentioned in the previous post, I was able to clock my FPGA up to 80MHz; at one hash per 64 cycles, this gives a hashrate of 1.25 megahashes per second (MH/s).  The whole Bitcoin mining algorithm is embarrassingly parallel, so a simple speedup is to go to a multiple hash-core design.  I did this by staggering the three cores to start at different cycles, and have the control unit increment the nonce any time any of them started work on it (in contrast to having the cores choose their own nonces).  Synthesizing and mapping this design took quite a while -- there were warnings about me using 180% of the FPGA, but the tools were apparently able to optimize the design after emitting that -- and when done I had a 3.75MH/s system.

Further optimizations

I tried putting a fourth hash core on the design, but this resulted in a 98% FPGA utilization, which made the tools give up, so I had to start looking for new forms of optimization.

The first thing I did is optimize some of the protocols: as I mentioned, the FPGA sends back the computed hash to the PC with a successful nonce.  This helped with debugging, when the FPGA wasn't computing the hash correctly or was returning the wrong nonce, but at this point I'm fairly confident in the hash cores and don't need this extra level of redundancy.  By having the control unit (ie the thing that controls the hashing cores) not send the hash back to the computer, the optimizer determined that the hash cores could avoid sending the computed hashes back to the control unit or even computing anything but the top 32 bits, which resulted in a very significant area reduction [TODO: show utilization summaries]: this was enough to enable me to add a fourth core.

The next thing I did, at a friend's recommendation, was floorplanning.  Floorplanning is the process of giving guidance to the tools about where you think certain parts of the design should go.  To do this, you have to first set XST to "keep hierarchy" -- ie XST will usually do cross-module optimizations, but this means that the resulting design doesn't have any module boundaries.  I was worried about turning this setting on, since it necessarily means reducing the amount of optimizations that the tools can do, but my friend suggested it could be worth it so I tried it.  I was pretty shocked to see the placement the tools produced: all four cores were essentially spread evenly over the entire FPGA, despite having no relation to each other.  The Xilinx Floorplanning Guide suggested setting up "pblocks", or rectangular regions of the fpga, to constrain the modules to.  Since the miner is dominated by the four independent hash cores, I decided to put each core in its own quadrant of the device.  I reran the toolchain, and the area reduced again! [TODO data]

The next things I'm planning on doing is not having to send the nonces back from the cores to the control unit: since the control unit keeps track of the next nonce to hand out, it can calculate what nonce it handed out to each core that corresponds with what the core is outputting.  This is dependent on the inner details of the core, but at this point I'm accepting that the control unit and core will be fairly tightly coupled.  One possible idea, though, is that since the control unit submits the nonces to the PC, I can update my PC-based mining script to try all nonces in a small region around the submitted one, freeing the control unit of having to determine the original nonce in the first place.

Another area for optimization is the actual chunker design; right now it is mostly a straight translation from the Wikipedia pseudocode.  The Post-Place & Route Static Timing report tells me that the critical path comes from the final round of the chunk algorithm, where the chunker computes both the inner state update plus the output for that chunk.


But before I get too hardcore about optimizing the design, I also want to try branching out to other parts of the ecosystem, such as producing my own FPGA board, or building a simple microcontroller-based system that can control the mining board, rather than having to use my power-hungry PC for it.

Filed under: bitcoin, fpga No Comments

FPGA Bitcoin Miner: First Implementation

So, now that I have a working UART module and a simple bitcoin miner, it's time to implement SHA256 functionality.  Specifically, I'm going to implement the 512-bit per-chunk part of the algorithm, since that seems like a good level of abstraction.  There's some other stuff the algorithm does, such as setting initial values and padding, but in the context of Bitcoin that functionality is all fixed.  Another benefit of implementing at the chunking level, rather than the full SHA256 level, is that typically a Bitcoin hash requires three iterations of the chunk algorithm (two for the first hash iteration, one for the second), but the first chunk stays the same as you iterate over the nonces, so we'll precompute that.

Since sha256 is all computation and no IO, it was fairly easy to write a decent simulation testbench for it, which is nice since it reduced edit-to-run latency and it made it much easier to look at the 32-bit up to 512-bit signals that sha256 involves.  There were two main tricky things I had to deal with in the implementation: the Wikipedia algorithm is very straightforward to implement in a normal procedural language, but I had to put some thought into how to structure it for a sequential circuit.  For instance, I wanted to calculate the 'w' variables at the same time as doing the rounds themselves, which lent itself to a natural 16-word shift register approach, where on round i you calculate w[i+16].

The other tricky part was byte- and word- ordering; while there's nothing theoretically challenging about this, I got myself in trouble by not being clear about the endianness of the different variables, and the endianness that the submodules expected.  It didn't help that both the bitcoin protocol and sha256 algorithm involve what I would consider implicit byte-flipping, and don't mention it in their descriptions.


The main work for this project was the integration of all the parts that I already have.  I didn't really implement anything new, but due to the nature of this project, correctness is all-or-nothing, and it can be very hard to debug what's going wrong since the symptom is that your 256-bit string is different than the 256-bit string you expected.

For this part of the project, I focused on functionality and not performance.  I tried to build everything in a way that will support higher performance, but didn't spend too much time on it right now except to turn the clock speed up.  The result is that I have an 80MHz circuit that can calculate one hash every 64 cycles, which works out to a theoretical hashrate of 1.25MH/s.  My "Number of occupied Slices" is at 31% right now, so assuming I can fit two more copies of the hasher, this should be able to scale to 3.75MH/s before optimization.  My target is 10MH/s, since these guys have reported getting 100MH/s with a Spartan 6 LX150, which is 10x larger than my LX16 (I'm not sure why they didn't call it an LX15).

I set up a new github repo for this project, which you can find here (GPL licensed).

Filed under: bitcoin, fpga No Comments


I'm not really sure how it started, but all of a sudden (past 24 hours) I've become very interested in bitcoin.  Not so much in a "I think it's the future" kind of way, but more a "there's a lot of interesting stuff going on and it's a lot of fun."

I think that sums up how I feel about Bitcoin: it's fun.  It has all the features of online games: it's easy to get into, it's easy to measure your progress, there are infrequent but meaningful bursts of rewards, and you have to acquire increasingly rare and powerful equipment in order to get ahead.

A lot of people are dismissive of Bitcoin, but personally I compare it more to something like World of Warcraft gold than a national currency: there's a value to WoW gold, even though you can't buy anything tangible with it and the gold has no fundamental economic value.  Chinese gold farmers might not be the highest-regarded internet citizens, but I haven't heard the argument that they don't understand fundamental economic principles when they chase after that virtual currency.  This is even more prevalent in EVE online, where battle losses are regularly reported in dollar figures in addition to ISK (the in-game currency), since the game officially supports dollar-to-ISK conversion (though you can only use the dollars you get through this to pay for your subscription).  In the game you can see how motivated people are to earn ISK; you might not agree with their choice of time and money, but no one is arguing that they're misguided about trying to earn virtual currency, since the players don't claim that they do it for real-world economic reasons.

I think a similar analogy can be drawn with gambling: like many of my friends, I can understand the allure of playing EV-negative games-of-chance at a casino (my favorite: Casino War).  Even though I'm not at the casino to make money, I'll still try to use a money-maximizing strategy, even though this sort of profit-maximization seems at odds with the decision to gamble in the first place.


I think something similar can be said about Bitcoin: it's incredibly fun to be involved in, and I don't regret the time + computing cycles I devote to it even though I know it doesn't make economic sense.  It's been a lot of fun to learn the bitcoin protocol (technically the Stratum bitcoin-mining-pooling protocol) and see how much performance I can get out of a single Python process.  Who knows, I may even take a look at one of the Bitcoin exchanges, or try to build a bitcoin miner out of my FPGA.  And my goal from all of this will be to get as many Bitcoins as I can, even though I'm definitely not convinced of Bitcoin's future.

To be sure, there are definitely some people who are making money off of Bitcoin -- it seems to be the people profiting off of the Bitcoin enthusiasts, though, rather than the Bitcoin enthusiasts themselves.  From Wikipedia, this seems remarkably similar to the California gold rush: "Recent scholarship confirms that merchants made far more money than miners during the Gold Rush" [1].  A "gold rush" actually seems like a remarkably apt term given the nature of Bitcoin, how people are reacting to it, and the fact that "[m]ost, however, especially those arriving later, made little or wound up losing money" [ibid].

So overall I'd say that Bitcoin is a really neat side project if you have spare time, and on the other end if you're well-capitalized and have a lot of time on your hands, I think there are ways you can make money.  I don't think I'll understand the people in the middle, though, who are small-time players but still decide to invest real money into bitcoin with the only goal of getting more money back.


So what I'm trying to say, is the next couple posts will be about building a sha256 hasher and not about building a processor :)

Filed under: bitcoin No Comments