On Mon, 7 Jul 2008, Mike Gerdts wrote: > > As I have considered deduplication for application data I see several > things happen in various areas.
You have provided an excellent description of gross inefficiencies in the way systems and software are deployed today, resulting in massive duplication. Massive duplication is used to ease service deployment and management. Most of this massive duplication is not technically necessary. > There tend to be organizational walls between those that manage > storage and those that consume it. As storage is distributed across > a network (NFS, iSCSI, FC) things like delegated datasets and RBAC > are of limited practical use. Due to these factors and likely It seems that deduplication on the server does not provide much benefit to the client since the client always sees a duplicate. It does not know that it doesn't need to cache or copy a block twice because it is a duplicate. Only the server benefits from the deduplication except that maybe server-side caching improves and provides the client with a bit more performance. While deduplication can obviously save server storage space, it does not seem to help much for backups, and it does not really help the user manage all of that data. It does help the user in terms of less raw storage space but there is surely a substantial run-time cost associated with the deduplication mechanism. None of the existing applications (based on POSIX standards) has any understanding of deduplication so they won't benefit from it. If you use tar, cpio, or 'cp -r', to copy the contents of a directory tree, they will transmit just as much data as before and if the destintation does real-time deduplication, then the copy will be slower. If the copy is to another server, then the copy time will be huge, just like before. Unless the backup system fully understands and has access to the filesystem deduplication mechanism, it will be grossly inefficient just like before. Recovery from a backup stored in a sequential (e.g. tape) format which does understand deduplication would be quite interesting indeed. Raw storage space is cheap. Managing the data is what is expensive. Perhaps deduplication is a response to an issue which should be solved elsewhere? Bob ====================================== Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss