> From: Matthew Ahrens [mailto:mahr...@delphix.com] > Sent: Wednesday, May 25, 2011 6:50 PM > > The DDT is a ZAP object, so it is an on-disk hashtable, free of O(log(n)) > rebalancing operations. It is written asynchronously, from syncing > context. That said, for each block written (unique or not), the DDT must be > updated, which means reading and then writing the block that contains that > dedup table entry, and the indirect blocks to get to it. With a reasonably > large DDT, I would expect about 1 write to the DDT for every block written to > the pool (or "written" but actually dedup'd).
So ... If the DDT were already cached completely in ARC, and I write a new unique block to a file, ideally I would hope (after write buffering because all of this will be async) that one write will be completed to disk - It would be the aggregate of the new block plus the new DDT entry, but because of write aggregation it should literally be a single seek+latency penalty. Most likely in reality, additional writes will be necessary, to update the parent block pointers or parent DDT branches and so forth, but hopefully that's all managed well and kept to a minimum. So maybe a single new write ultimately yields a dozen times the disk access time... I'm honing this in closer, but so far what I'm seeing is ... zpool iostat indicates 1000 reads taking place for every 20 writes. This is on a literally 100% idle pool, where the only activity in the system is me performing this write benchmark. The only logical explanation I see for this behavior is to conclude the DDT must not be cached in ARC. So every write yields a flurry of random reads... 50 or so... Anyway, like I said, still exploring this. No conclusions drawn yet. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss