Re: [zfs-discuss] DDT sync?

Edward Ned Harvey Thu, 26 May 2011 07:26:30 -0700

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
>  
> Both the necessity to read & write the primary storage pool...  That's
very
> hurtful.


Actually, I'm seeing two different modes of degradation:
(1) Previously described.  When I run into arc_meta_limit, in a pool of
approx 1.0M to 1.5M unique blocks, I suffer ~50 reads for every new unique
write.  Countermeasure was easy.  Increase arc_meta_limit.

(2) Now, in a pool with 2.4M unique blocks and dedup enabled (no verify), a
test file requires 10m38s to write and 2m54s to delete, but with dedup
disabled it only requires 0m40s to write and 0m13s to delete exactly the
same file.  So ... 13x performance degradation.  

zpool iostat is indicating the disks are fully utilized doing writes.  No
reads.  During this time, it is clear the only bottleneck is write iops.
There is still oodles of free mem.  I am not near arc_meta_limit, nor c_max.
The cpu is 99% idle.  It is write iops limited.  Period.

Assuming DDT maintenance is the only disk write overhead that dedup adds, I
can only conclude that with dedup enabled, and a couple million unique
blocks in the pool, the DDT must require substantial maintenance.  In my
case, something like 12 DDT writes for every 1 actual intended new unique
file block write.

For the heck of it, since this machine has no other purpose at the present
time, I plan to do two more tests.  And I'm open to suggestions if anyone
can think of anything else useful to measure: 

(1) I'm currently using a recordsize of 512b, because the intended purpose
of this test has been to rapidly generate a high number of new unique
blocks.  Now just to eliminate the possibility that I'm shooting myself in
the foot by systematically generating a worst case scenario, I'll try to
systematically generate a best-case scenario.  I'll push the recordsize back
up to 128k, and then repeat this test something slightly smaller than 128k.
Say, 120k. That way there should be plenty of room available for any write
aggregation the system may be trying to perform.

(2) For the heck of it, why not.  Disable ZIL and confirm that nothing
changes.  (Understanding so far is that all these writes are async, and
therefore ZIL should not be a factor.  Nice to confirm this belief.)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] DDT sync?

Reply via email to