Mattias Pantzare wrote: > For this application (deduplication data) the likelihood of matching > hashes are very high. In fact it has to be, otherwise there would not > be any data to deduplicate. > > In the cp example, all writes would have matching hashes and all need > a verify.
Would the read for verifying a matching hash take much longer than writing duplicate data? Wouldn't the significant overhead be in managing hashes and searching for matches, not in verifying matches? However, this could be optimized for the cp example by keeping a cache of the hashes of data that was recently read, or even caching the data itself so that verification requires no duplicate read. A disk-wide search for independent duplicate data would be a different process, though. Cheers, 11011011 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss