On Tue, Jun 15, 2010 at 7:28 PM, David Magda <dma...@ee.ryerson.ca> wrote: > On Jun 15, 2010, at 14:20, Fco Javier Garcia wrote: > >> I think dedup may have its greatest appeal in VDI environments (think >> about a environment with 85% if the data that the virtual machine needs is >> into ARC or L2ARC... is like a dream...almost instantaneous response... and >> you can boot a new machine in a few seconds)... > > This may also be accomplished by using snapshots and clones of data sets. At > least for OS images: user profiles and documents could be something else > entirely.
It all depends on the nature of the VDI environment. If the VMs are regenerated on each login, the snapshot + clone mechanism is sufficient. Deduplication is not needed. However, if VMs have a long life and get periodic patches and other software updates, deduplication will be required if you want to remain at somewhat constant storage utilization. It probably makes a lot of sense to be sure that swap or page files are on a non-dedup dataset. Executables and shared libraries shouldn't be getting paged out to it and the likelihood that multiple VMs page the same thing to swap or a page file is very small. > Another situation that comes to mind is perhaps as the back-end to a mail > store: if you send out a message(s) with an attachment(s) to a lot of > people, the attachment blocks could be deduped (and perhaps compressed as > well, since base-64 adds 1/3 overhead). It all depends on how this is stored. If the attachments are stored like they were in 1990 as part of an mbox format, you will be very unlikely to get the proper block alignment. Even storing the message body (including headers) in the same file as the attachment may not align the attachments because the mail headers may be different (e.g. different recipients messages took different paths, some were forwarded, etc.). If the attachments are stored in separate files or a database format is used that stores attachments separate from the message (with matching database + zfs block size) things may work out favorably. However, a system that detaches messages and stores them separately may just as well store them in a file that matches the SHA256 hash, assuming that file doesn't already exist. If does exist, it can just increment a reference count. In other words, an intelligent mail system should already dedup. Or at least that is how I would have written it for the last decade or so... -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss