kmod's blog



Well, I hate to say it, but Avatar didn't live up to my expectations. Probably because they were impossibly high. Personally, i found the storyline to be cliche and predictable. The graphics, as expected, were amazing, but overall the movie was missing that feeling of "oh s***" that I've felt in all my all-time favorite movies.  Totally worth seeing, though; it just didn't make it to my list of favorite movies, like I was led to believe it would.

I read this great review, that I totally agree with, that compared Avatar's effects on the film industry to the iPhone's effects on the smart phone industry: revolutionary. I also think they both have another important property: they're expectation setting. The create a new class for themselves, which is miles above the other products. But within that class, you figure that someone could do a better job (for instance, by putting a worthwhile storyline into the movie). Fortunately, I think we're going to see a lot of directors try to copy Cameron, just like a lot of phone manufacturers coppied Apple. And I look forward to seeing more movies that use 3D so well (but ad long as they're on circularly-polarized projectors, not linearly-polarized!).

Filed under: Uncategorized 1 Comment

Topcoder SRM 456

Today I took part in another Topcoder SRM -- only a week after the previous one.  I hadn't competed in any since March, and now there were two in the space of a week.

Anyway, the competition this time was fairly easy, so it seems like they were (over-)compensating from the hard tournament last week.  The first problem was literally 6 lines of code.  I'm interested to see the solution to the second -- most of the solutions I saw were binary search algorithms that would narrow it down to an answer, which seemed very unsatisfactory to me.  (And by the way, one trick I learned from this competition, that in retrospect should have been more obvious, is that when you do a binary search with a fixed starting range and a fixed precision target, it's better to just hardcode the number of iterations it will take rather than testing that the precision is smaller than the target -- this avoids any possible floating point issues.)

Filed under: Uncategorized No Comments

Topcoder rankings

I recently took part in a TopCoder Single Round Match (SRM), which ended up being very difficult.  You can see the overall statistics here -- out of 557 people that opened the easiest problem (ie took a look at the problem description), only 223 (40.0%) got it correct.  In my (limited) experience, that's a very low number for the first problem on an SRM.  It was also a 300-point problem; usually the easiest problem is worth 250 points, which means that this problem was significantly harder enough to warrant a score readjustment.  This is compared to the last SRM where 487 out of 656 people solved the first problem (74.2%).

Not only was the first problem much harder than usual, the second problem was extremely hard.  To be precise, extremely few people solved it (or the third problem), which could be due both to it being difficult, and people running out of time by spending it all on the first problem.  But still, the stats speak for themselves: only 23 people solved the second problem, and 9 people solved the third problem, so only 5.75% of people who attempted the first problem were able to solve one of the other two problems (well, less, since there were some people that got all three).  Comparing it again to SRM 454, in that SRM 314 people solved the second problem, so almost half of the people that attempted to solve the first problem solved the second.  That fraction is 10 times higher than in the SRM I took part in.  Suffice to say, the scores were very low.

In fact, getting a 0 in this SRM actually had a good effect for some people.  Take a look at the rankings for my room.  Three people who got zero received between 15 and 18 points, so it seems that if your ranking is low enough, you can just enter difficult enough SRMs to improve it.  Though the guy who had a ranking of 1939 but got a zero lost 160 points, so it really only works if your ranking is low enough.  But whatever you do, don't get a negative score: the people that did that all saw their rankings go down by >100 points.  It seems that missing that one challenge will put you below everyone that gets a zero, so your ranking ends up being very bad (ie near the bottom) and you lose big.  I, for one, am not very good at making successful challenges, so I will keep this in mind when deciding whether or not to go for that risky challenge.

Filed under: Uncategorized No Comments

Google: the benevolent dictator of information

Google comes under a lot of fire these days for "controlling" too much of the information of the internet.  A lot of this criticism comes from the dying old-media types that'd rather blame someone else than innovate, and a lot of this criticism is (rightly) deflected.  Google doesn't own any data; even with Google Books, Google doesn't own the rights to any (most?) of the books that they scan: they release them into the public domain.  Anyone else is free to take these public domain books and build an alternate system around them.  Anyone else is free to make other partnerships with content sources to get the non-public-domain books as well.

But I still think that in a very real sense, Google controls the access to this data.  Particularly for Google News; there they certainly don't own any of the data.  But Google, through Google News, sets policies for what articles will have good search rankings.  And I'm sure that articles with good search rankings do dramatically better, thus making Google News search ranking a measure of how "good" an article is, that forms a set of de-factor standards for content sources to follow.  Similarly with the main web search; Google rank is a highly valuable resource, around which an entire subindustry has spawned.  By controlling the ranking algorithm, Google controls what pages show in search results.  And thus what pages people go to when they search.  So in a very real way, Google decides what pages are good for people to go to.

And that's scary, because it means that Google is going to send us to whatever information sources it wants us to go to, which are only the sources we want to go to if they are coincidentally the same.  Luckily, so far they are the same, due to what Google's incentives are.  Since Google gets the vast majority of its money from advertising (high high 90%s), Google's incentive is to get us to see more of their ads.  And since they have enough of the ad market, they can't rely on taking more of it to get more ads out there; they have to grow the market itself.  So Google's incentive is to get people to spend more time on the web, and their method of doing that is to provide people with methods of getting what they want on the web.

So this is why I see Google as a benevolent dictator.  They are a company that controls a vast amount of power when it comes to information access, but for now all they want to do is give us what we want.  But we shouldn't forget that Google is just a company that's going to do what it wants, so if ever Google wants to do something that doesn't align with our interests, we might find it becomes more of a microsoft.  But until then, Google is a great company.

Filed under: Uncategorized No Comments

Why mozy is terrible

Disclaimer: I am now an employee of Dropbox, and though I see Mozy and Dropbox as providing distinct services, some may see them as competitors.

I've started to think about backup again lately, since I have more and more files that would really suck to lose.  Until now I've used a combination of Dropbox and Jungledisk to store my files -- Dropbox for the small things that I always need with me (and nothing more, since I really dislike how you have to download everything onto every computer that you link to your account), and Jungledisk for redundancy (yes that's paranoid) and for backing up some other, bigger things.

But the problem is that both of these services have $/gb costs.  And lately I've become a huge fan of VMs -- for every big project that I have that requires lots of setup (for instance lots of custom programs for a custom development environment) that is hard to redo, I simply create a VM with it all set up.  Then when I need to work on my laptop, just transfer it over and I'm ready to go!  Or when my desktop breaks down and I need to reinstall the OS, just reinstall VMWare and load up the VM and I haven't lost anything!

But the problem is that these VM images are huge.  I suppose if I had a lot of time there are things that I could do to bring that down, but the point is that they're on the order of 10gb each.  And I don't need them so badly that I want to pay $2/month for each of them.  So I decided to look into unlimited backup options.

I looked at Mozy and Carbonite, since they both advertise themselves as being unlimited.  I did some googling about them, and Carbonite has a bad reputation of closing off people's accounts when they use too much space.  WTF.  How are they claiming to even offer unlimited backup?  Well, those reviews were from a while ago, and some support people have responded to them saying it's not the case anymore.  But they still have in their TOS that they're going to close down your account if you use too much space.

So I sent Carbonite an email asking them how much "too much space" is, when they'd shut down my account.  Because the worst part is not knowing.  I got back an email from a clueless support guy that was obviously just copied and pasted but didn't have anything to do with my question.  So I decided not to go with Carbonite.  (Though someone else did get back to me saying that they don't really terminate your account.  They just throttle you after 200GB.  Great.)

So instead I went ahead and set up Mozy.  I've used their free version in the past and been fairly happy with it (though I noticed some performance problems), so I decided to sign up for their unlimited backup plan.  They didn't have any bad reviews in the past, so I signed up for one month and set up an initial backup of 350GB.  And then the problems started.

First, it seemed like they were throttling my account.  My MIT internet has a great 1mbps 1MBps upload speed, so I thought I'd be able to upload 3gb an hour.  Instead, what would happen is that the client would upload exactly 100mb, then stop and say "Communicating with servers..." for the rest of the hour.  I figured that they were throttling upload speeds to 100mb an hour, since that's probably what most people can get, so I emailed them saying that they didn't mention that there'd be throttling and I was unhappy.  They asked me to give them log files and stuff (which I checked and were empty), but then the next day the problem went away.  So I figure they just unthrottle anyone who actually cares.  Oh well, I'm now backing up at 7mbit/s, so I'm happy.

But then the backup crashes.  And again.  and again and again and again and indefinitely.  Why?  Because Mozy encrypts the entire file before sending it over the link.  This doesn't sound like a terrible idea for small files, which most files are, but for 10gb vm images, it sucks.  Especially since Mozy puts the encrypted files in your windows temp folder.  I know how to move the temp folder, but I have it on my SSD so that it's fast, but that also means that I don't have 10gb of temp space.  So it would try to encrypt the 10gb file, fail because there's not enough space, and cancel the entire backup.  Great.  If I want to use mozy I have to move my temp file to one of my larger drives.  Well, it's not the end of the world, so ok I'll go ahead and do that.

And then now Mozy has gone completely beserk, and has killed my computer. It was doing 40k iops WRITES on my drive.  Ok I guess it can achieve that because they're small sequential writes, but still wtf, it means that I can't do anything else with the drive.  And then another stupid piece of their code came into play: they do ALL OF THE ENCRYPTION at once when doing the backup.  Usually they're smart and only encrypt a few 100mb in advance, but somehow the program has decided to encrypt the rest of the 160GB of my backup.  And there's only 140gb of space on the machine.  And then it used it all.  But it didn't die this time -- I guess it knew that it didn't need all of that space.  So instead it just spun.  I have no idea what it was doing.  But it hogged the entire drive until I was able to get rid of 20gb of stuff from that drive to another.  And do you know how long it takes to do that when there's a program trying to do 40k iops on the drive?  A long frickin time, that's how long.  They even taunt you with this nice little "throttling" slider that you can use to throttle the backup process.  I moved it, and guess what, it did nothing.  Still 40k iops.  I have no idea what happened, but eventually I was able to get rid of enough stuff and then the encryption could continue.

So the result is that Mozy is one of the most demanding programs that I've ever seen.  And it's been 2 weeks and I've only uploaded 200gb, when I should be able to upload 75gb a day.  So yeah it's kind of unclear if I'm going to keep on paying them for the service.  Eventually I'm going to have a personal dedicated server somewhere that I can use for stuff like this.  Until then, I'm either going to have to deal with the crap that is Mozy, or suck it up and pay the $.15/gb/month for Jungledisk.  Or not back the stuff up and hope nothing goes wrong.

Update: I've finally managed to upload all of my data to Mozy, and now it seems pretty good -- it does nice small delta syncs even when I touch large VM images.  And the throttling for the most part works.  It's just very frustrating to do a large initial sync -- which I suppose Mozy has no incentive to change.

Filed under: Uncategorized 3 Comments

The Great Climate-gate Debate

I went to this a forum today on the recent "Climate-gate" event. In case you haven't heard about it, someone hacked into the University of East Anglia's email system and publicly posted a large number of emails sent by climate scientists working in their Climatic Research Unit. And some of those emails are pretty bad -- they talk about withholding data, keeping papers from being published, and other scientifically unethical things.

There was a forum today at MIT on the incident, along with discussion on climate science in general.  As far as I could tell from some simple google searching (I missed the introductions, unfortunately), the panelists are very well-respected, and I was very lucky to be able to hear them all talk.  Here's a quick summary of what people said (descriptions taken from the MIT event site):

  • Kerry Emanuel, Breene M. Kerr Professor of Atmospheric Science, Department of Earth, Atmospheric and Planetary Sciences -- said Yes these emails look bad, but it's normal human beings talking to each other.  and besides, there is so much evidence already that even if we throw out the contributions of these people it doesn't matter.  Oh and climate skepticism is a big conspiracy.
  • Richard Lindzen, Alfred P. Sloan Professor of Meteorology, Department of Earth, Atmospheric and Planetary Sciences -- said These emails look bad and show that we've been right when we've been saying that climate alarmists have been withholding data and prevent paper publication.  Oh, and there is a vast global warming conspiracy.
  • Judith Layzer, Edward and Joyce Linde Career Development Associate Professor of Environmental Policy, Department of Urban Studies and Planning -- said Well I don't know much about climate science, but ultimately the public is not going to be swayed by the weight of the evidence, but rather some sort of political persuasions.  So these emails may not affect the evidence in any meaningful way, but they will affect public perception very heavily.
  • Stephen Ansolabehere, Professor, Department of Political Science, MIT and Professor of Government, Harvard University -- said something about how we should use this as a teachable moment about ethics in science, and how scientists need to really self-police themselves to earn back the trust of the public.
  • Ronald Prinn, TEPCO Professor of Atmospheric Science, Department of Earth, Atmospheric and Planetary Sciences, Director, Center for Global Change Science -- said that he changed his mind about global warming between his two congressional hearings in 1997 and 2007.  And we can do much much better [in discussing global warming? not sure]

Ultimately, I think Prof. Layzer had the best take-home message (which was contributed to by everyone else, so I don't want to attribute it just to her).  There are lots of people on both sides of the debate, there's lots of money on both sides, and there's a lot of politics on each side.  In the end, simply improving the climate science is not going to settle the matter, just like improving evolution science is not going to convince the 40% of the US population that doesn't believe in it.

So where does that leave us?  I'm not really sure what to believe.  I think now I'm a global warming agnostic (before I was a global warming atheist) -- I think both sides of the science are so heavily politicized that you can't really trust either side any more.  I think the only way we will know for sure is by letting the world play out and see what happens; in the meantime, we're going to have to decide something somehow, and we can only hope that it's right.

Filed under: Uncategorized No Comments

Why generator functions suck

I recently had to debug a piece of code similar to the following:

def f(x):
  return globals.get(x)

def g(x):
  # Very long function that calls f() at certain points

#In a different module:
def h(x):
  return g(x)

The problem was that something was breaking inside the function of f. I started inserting print statements to try to debug, and things were going pretty well. I found a minimal test that I could do to show that things were messed up, and it turns out things were bad as soon as f was entered. So then I started looking at g. I found all the times that it called f. Things were indeed messed up before calling f, and in fact they were messed up at the beginning of g as well!

So then I started looking into h. But things were fine right before g was entered, and were fine after it exited as well. How could this happen? The fact that they're in different modules seemed a likely culprit, so I did some tests on that. No luck -- it wasn't because they were in separate modules. It wasn't because they were being run on separate threads.  None of these functions were decorated in any way.  It was not obvious at all what was going on.

And then I looked closer at g. And buried in the middle of it is a

      yield foo

which means that g is actually a generator function. There was no way for me to know that; the function was called like a normal function, and it looked like a normal function. Yet the semantics of it are completely different! So the problem was that the actual body of g wasn't being run until much later, after h returns, and the conditions it expects have disappeared. If g was somehow marked as a generator function, the problem would have been somewhat straightforward. In the current version, though, it takes looking through the entire body of g to determine that g does not behave at all like what it looks like it does when you use it. And that is why generator functions (and other similar syntactic sugar in other languages) suck.

In case you were wondering, the specific example concerns Pylons. You can return a generator from your web handler, and Pylons will iterate through it until the end. This is nice for things like serving large files off disk. You can create a generator that reads the file a bit at a time when it is needed, instead of loading the entire file and sending it all at once.

The problem, though, is that the global 'request' object disappears by the time Pylons actually calls your generator. So anything inside your generator function, or anything that your generator function calls, will die a violent death if it wants to access anything about the request that isn't saved. This took forever to figure out.

Filed under: Uncategorized No Comments

Linode vs EC2 vs Shared Hosting vs Dedicated

As I mentioned in the first post, I set this blog up on a Linode virtual server.  I debated for a while about how I wanted to host it.  My options were:

  • EC2 instance
  • Linode VPS
  • prgmr VPS
  • On my desktop machine
  • Shared hosting
  • Get a dedicated server

First I looked into shared hosting, like  I wasn't that impressed -- you can only host the types of things that they let you, and overall they want to manage your experience.  I'd have to go to one of their higher service levels to get python capabilities, but I still wouldn't get ssh+root access to the machine.  I also worry about the security of these shared hosting options -- not that I worry about losing all of my precious blog posts, but I know that these solutions aren't that secure (it only takes one person putting a dumb piece of php code in their site to give an attacker root access to the box), and I'd rather not have to think about it.  And most of all, I just wouldn't have any control over the server, so I couldn't do just whatever I wanted.  I dismissed shared hosting as an option pretty quickly.

Then I thought about getting a virtual server (VPS), which would give me the control I wanted.  I found this interesting comparison of different VPS services, also compared to EC2, and it seemed that Linode was a clear choice when it came to performance. can be a fair amount cheaper ($6/month vs $20/month for the cheapest options at and, respectively), and the "we assume you're not stupid" attitude was appealing, but I decided that since I'd have to switch away from it eventually it wasn't worth it.  They also support many features only inasmuch as you can send an email to the guy who runs it and get him to change some settings for you.

EC2 has a lot more features, primarily aimed at cloud computing / production web services.  For instance, you can shuffle IPs around machines, load balance between multiple instances, and spawn new VMs if your demand is high (these all cost extra, though).  You can also shut down your servers and not have to pay for them.  I figured that I wouldn't need any of this stuff for my personal server, so it wasn't worth the cost ($40/month+bandwidth).

I also thought about hosting it directly off of my local machine.  That's what I've done in the past, and it works okay, but there are some real benefits to having the server be unlinked from my machine.  Uptime is the main benefit, but also I don't have to worry about taking down my box at all.

So the next step from that is getting a dedicated server for myself.  There seem to be some pretty good dedicated hosting options out there, but they all start at $100/month, which is way more than I want to be spending right now.  Those options, though, are highly competitive with the $100 VPS options (ie they are often better in many regards).  There are some features that you get with VPS, such as the ability to resize servers and move them around, but this would be negated if I could easily transfer the settings and data from the server to somewhere else.  And I'm going to have to do that if I ever want to change VPS providers, so I'm going to keep that in mind when I configure my server, and most likely if I ever hit the $100/month performance tier (not likely given the current traffic I'm getting on my blog -- thank you spam bots, though, for the love), I'll switch to dedicated.

So in the end, I ended up settling for a linode VPS.  Luckily I got put on a pretty idle machine, so I get as many CPU cycles as I want.  I don't know how long 360MB is going to last me, though.  The other services all give upwards of 1GB of ram at this price range, but so far 360MB has been okay.

It's only been a week with the server, but so far I've been happy.

Update 5/12/2010:

It's been a few months now, and I've been very happy with the quality of service from linode.  It's not perfect -- when I switched this blog to use SSL for the admin pages, there's a noticeable lag when I try to open pages.  This worries me, especially since I've never seen any load on the node I'm on.  Also, the $20/month is somewhat high given the specs of the VM.  But I've been very impressed with the professionalism and management options that they provide -- I'm paying extra for it to "just work".  Up to you if that's worth it.

I'm thinking of getting a second VPS, and for that I definitely don't want to pay that much extra.  I just want a cheapie VPS that I can throw random personal stuff on, just to have a server at a well-known address that I can count on to be up 99% of the time.  Paying $10 extra a month for the comfort of linode doesn't make much sense.  My conditions are at least 256mb, and at most $10/month (after discounts).  Here are some potential providers I've found:

One thing they have in common (in addition to being budget VPS providers) is that they all have terrible reviews online (except for prgmr).  I'll give some more time to get more hardware, and then look more into these other options.  I guess the upside is that if I sign up with them, my maximum loss is $10 (less if I pick one with a money-back policy, though that's extremely rare with these sites because they know they're terrible).

Filed under: Uncategorized 11 Comments

DARPA Network Challenge Over

I posted yesterday about the DARPA Network Challenge, and the MIT team for it.  The balloons were released this morning, and the MIT team won!  I looked a little more into the MIT team, and from their site:

We are a group of researchers at M.I.T. interested in understanding the role of social networks in information spreading.

Looks like they're from the Media Lab.  Cool stuff.  DARPA doesn't have any analysis about what strategies people used, but supposedly they'll have that soon.  I tried to find what other people have done, but it seems like to see the other good strategies, we'll have to wait for the report.

Filed under: Uncategorized No Comments

Putnam 2009

My fourth and final Putnam (though according to the MIT math department, my fifth??) was today.  I haven't done any real math in a while, so one of the questions in the fist half was pretty jarring:

Does there exist an abelian group where the product of the orders of the members is 2^2009?

I vaguely remembered that an abelian group is one that is commutative (which is correct), but also that I didn't remember anything else so I would never be able to do the problem (which is also correct).  I wrote "No" and moved on -- I ended up being right, and it didn't actually to say to prove it...  The next problem I worked on was a nasty calculus problem.  I spent most of my time in the first half antidifferentiating things like 1/(x^2+1) and tan(x).

There were two other approachable problems in the first half, but I was annoyed by the amount of formal math that you needed to do the others.  I suppose, though, that given the kind of competition that the Putnam is, and the fact that I haven't taken that much math in college, I shouldn't be surprised that I was unequipped to do some of the problems.

The second half was a lot better.  The first 5 problems were approachable; the 4th talked about the dimension of a certain vector space but that was fairly intuitive.  B5, though, was a problem about the limit behavior of a function defined through a differential equation.  The intuition was easy enough to know that it was going to grow without bound, but since I don't know any formal analysis I wasn't able to relate local properties of the function to global properties, even though the link is "obvious".  Again, I was lacking the machinery to make the proof fully rigorous.

I think this says more about me and my math training (or lack thereof) than about how the Putnam should or should not be.

Anyway, now begins the long wait until results come out in spring.  I'm probably going to forget everything about this Putnam by then, but hopefully I've reversed my declining streak that I've been going through the past few years.
Filed under: Uncategorized No Comments