> What about virtual machine images?
> 
> > the tradeoff for this compression is a large amount of memory,
> > fragmentation, and cpu usage.  that is to say, storage latency.
> 
> I have 24GB RAM. My primary laptops have 8GB RAM. I have all this RAM
> not because of dedup but because I do memory intensive tasks, like
> running virtual machines. I believe this is true for many users.

russ posted some notes how how much memory and disk bandwidth are
required to write at a constant b/w of Xmb/s to venti.  venti requires
enormous resources to perform this capability.

also, 24gb isn't really much storage.  that's 1000 vm images/disk, assuming
that you store the regions with all zeros.

one thing to note is that we're silently comparing block (ish) storage (venti)
to file systems.  this isn't really a useful comparison.  i don't know of many
folks who store big disk images on file systems.

we have some customers who do do this, and they use the vsx to clone
a base vm image.  there's no de-dup, but only the change extents get
stored.

> I'm of a completely different opinion regarding fragmentation. On
> SSDs, it's a non issue. 

that's not correct.  a very good ssd will do only about 10,000 r/w random
iops.  (certainly they show better numbers for the easy case of compressable
100% write work loads.)  that's less than 40mb/s.  on the other hand, a good 
ssd will do
about 10x, if eading sequentially.

> My CPU can SHA-1 hash orders of magnitude faster than it can read from
> disk, and that's using only generic instructions, plus, it's sitting
> idle anyway.

it's not clear to me that the sha-1 hash in venti has any real bearing on
venti's end performance.  do you have any data or references for this?

- erik

Reply via email to