Brian wrote: > Ahh ok. Anyone who wants to do processing on the full history (and there are > a lot of these people who exist!) by definition *has* to be willing to throw > some money at it. It simply doesn't fit on commercial drives. I've personally never found much of a compelling reason to actually uncompress the dump, rather than working on the stream as it's being decompressed. 7zip decompression is pretty fast, and can use multiple cores on multi-core machines, so it never seems to be a bottleneck, for me at least--- I get somewhere around 30-40 MB/s typically. From what I can tell, the top-end EC2 instances do perform rather better than that, topping out at around 200 MB/s for sequential reads. But I don't personally run anything that can't run 5x slower in return for being free, and I suspect lots of analysis is of that "just let it run for a week, who cares" variety.
I'm not going to argue that nobody could benefit from using EC2 to do their analysis instead, but it's hardly the case that it's impossible to do full-history analysis on commodity hardware. -Mark _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l