There were a lot of useful details put into the thread "Summary: Dedup and L2ARC memory requirements"
Please refer to that thread as necessary... After much discussion leading up to that thread, I thought I had enough understanding to make dedup useful, but then in practice, it didn't work out. Now I've done a lot more work on it, reduced it all to practice, and I finally feel I can draw up conclusions that are actually useful: I am testing on a Sun Oracle server, X4270, 1 Xeon 4-core 2.4Ghz, 24G ram, 12 disks ea 2T sas 7.2krpm. Solaris 11 express snv_151a OS is installed on a single disk. The remaining 11 disks are all striped into a single 20 TB pool (no redundancy). Obviously not the way you would configure for production, the point is to get maximum usable size and max performance for testing purposes. So I can actually find the limits in this lifetime. With and without dedup, the read and write performance characteristics on duplicate and unique data are completely different. That's a lot of variables. So here's how I'm going to break it down: Performance gain of dedup versus performance loss of dedup. --- Performance gain: Unfortunately there was only one area that I found any performance gain. When you read back duplicate data that was previously written with dedup, then you get a lot more cache hits, and as a result, the reads go faster. Unfortunately these gains are diminished... I don't know by what... But you only have about 2x to 4x performance gain reading previously dedup'd data, as compared to reading the same data which was never dedup'd. Even when repeatedly reading the same file which is 100% duplicate data (created by dd from /dev/zero) so all the data is 100% in cache... I still see only 2x to 4x performance gain with dedup. --- Performance loss: The first conclusion to draw is: For extremely small pools (say, a few GB or so) writing with or without dedup performs exactly the same. As you grow the unique blocks in the pool, the write performance with dedup starts to deviate from the write performance without dedup. It quickly reaches 4x, 6x, 10x slower with dedup ... but this write performance degradation appears to be logarithmic. Where I reached 8x write performance degradation was around 2.4M blocks (290G used) but I never exceeded 11x write performance degradation even when I got up to 14T used (123M blocks) The second conclusion is: Out of the box, dedup is basically useless. Thanks to arc_meta_limit being pathetically small by default, (3840M in my 24G system) I ran into a write performance brick wall around 30M unique blocks in the system ~= 3.6T unique data in the pool. When I say "write performance brick wall," I mean up until that point, dedup writing was about 8x slower than writing without dedup, and after that point, the write performance difference grew exponentially. (Maybe it's not mathematically exponential, but the numbers look exponential to my naked eye.) I left it running for about 19 hours, and I never got beyond 5.8T written in the system. Fortunately, it's really easy to tweak arc_meta_limit. So in the second test, that's what I did. I set the arc_meta_limit so high it would never be reached. In this configuration, the previously described logarithmic write performance degradation continued much higher. In other words, dedup write performance was pretty bad, but there was no "brick wall" as previously described. Basically I kept writing at this rate till the pool was 13.5T full (113M blocks), and the whole time, dedup write performance was approx 10x slower than writing without dedup. At this point, my arc_meta_used reached 15,500M and it would not grow any more, so I reached a "softer" brick wall. I could only conclude that the data being cached in arc was pushing the metadata out of arc. But that's only a guess. So the 3rd test was to leave the arc_meta_limit at maximum value, and set the primarycache to metadata only. Naturally if you use this configuration in production, you're likely to have poor read performance because you're guaranteed you'll never have a data cache hit... But it could still be a useful configuration, for example, if you are using dedup on a write-only backup server. In this configuration, write performance was much better than the other configurations. I reached 3x slower dedup write performance almost immediately. And 4x occurred around 47M blocks (5.6T used). And 5x occurred around 88M blocks (10.6T used). It maintained 6x until around 142M blocks (17T used) and 15,732 M arc_meta_used. At this point I hit 90% full, and the whole system basically fell apart, so I disregard all the results that came later. Based on the numbers I'm seeing, I have every reason to believe the system could have continued writing with dedup merely 6x slower than writing without dedup, if I had not run out of disk space. At least theoretically this should continue until I cannot fit the metadata in ram anymore, and then I'll hit a brick wall... But the only way I can measure that is to go remove ram from my system and repeat the test. I don't think I'll bother. It has been mentioned before, that you suffer a big performance penalty when you delete things. This is very very true. Snapshot destroy, or even rm a file, and the time to completion is on the same order with the time it took to initially create all that data. This is a really big weak point. It might take several hours to perform the snapshot destroy of the oldest daily snapshot. And naturally that would be likely to occur on a daily basis every midnight.
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss