> What about virtual machine images? > > > the tradeoff for this compression is a large amount of memory, > > fragmentation, and cpu usage. that is to say, storage latency. > > I have 24GB RAM. My primary laptops have 8GB RAM. I have all this RAM > not because of dedup but because I do memory intensive tasks, like > running virtual machines. I believe this is true for many users.
russ posted some notes how how much memory and disk bandwidth are required to write at a constant b/w of Xmb/s to venti. venti requires enormous resources to perform this capability. also, 24gb isn't really much storage. that's 1000 vm images/disk, assuming that you store the regions with all zeros. one thing to note is that we're silently comparing block (ish) storage (venti) to file systems. this isn't really a useful comparison. i don't know of many folks who store big disk images on file systems. we have some customers who do do this, and they use the vsx to clone a base vm image. there's no de-dup, but only the change extents get stored. > I'm of a completely different opinion regarding fragmentation. On > SSDs, it's a non issue. that's not correct. a very good ssd will do only about 10,000 r/w random iops. (certainly they show better numbers for the easy case of compressable 100% write work loads.) that's less than 40mb/s. on the other hand, a good ssd will do about 10x, if eading sequentially. > My CPU can SHA-1 hash orders of magnitude faster than it can read from > disk, and that's using only generic instructions, plus, it's sitting > idle anyway. it's not clear to me that the sha-1 hash in venti has any real bearing on venti's end performance. do you have any data or references for this? - erik