Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

Edward Ned Harvey Sun, 08 May 2011 07:32:26 -0700

> From: Erik Trimble [mailto:erik.trim...@oracle.com]
> 
> (1) I'm assuming you run your script repeatedly in the same pool,
> without deleting the pool. If that is the case, that means that a run of
> X+1 should dedup completely with the run of X.  E.g. a run with 120000
> blocks will dedup the first 110000 blocks with the prior run of 110000.


I rm the file in between each run.  So if I'm not mistaken, no dedup happens
on consecutive runs based on previous runs.


> (2) can you NOT enable "verify" ?  Verify *requires* a disk read before
> writing for any potential dedup-able block. 

Every block is unique.  There is never anything to verify because there is
never a checksum match.

Why would I test dedup on non-dedupable data?  You can see it's a test.  In
any pool where you want to enable dedup, you're going to have a number of
dedupable blocks, and a number of non-dedupable blocks.  The memory
requirement is based on number of allocated blocks in the pool.  So I want
to establish an upper and lower bound for dedup performance.  I am running
some tests on entirely duplicate data to see how fast it goes, and also
running the described test on entirely non-duplicate data...  With enough
ram and without enough ram...  As verification that we know how to predict
the lower bound.

So far, I'm failing to predict the lower bound, which is why I've come here
to talk about it.

I've done a bunch of tests with dedup=verify or dedup=sha256.  Results the
same.  But I didn't do that for this particular test.  I'll run with just
sha256 if you would still like me to after what I just said.


> (3) fflush is NOT the same as fsync.  If you're running the script in a
> loop, it's entirely possible that ZFS hasn't completely committed things
> to disk yet, 

Oh.  Well I'll change that - but - I actually sat here and watched the HDD
light, so even though I did that wrong, I can say the hard drive finished
and became idle in between each run.  (I stuck sleep statements in between
each run specifically so I could watch the HDD light.)


>          i=0
>          while [i -lt 80 ];
>          do
>              j = $[100000 + ( 1  * 10000)]
>              ./run_your_script j
>              sync
>              sleep 10
>              i = $[$i+1]
>      done

Oh, yeah.  That's what I did, minus the sync command.  I'll make sure to
include that next time.  And I used "time ~/datagenerator"

Incidentally, does fsync() and sync return instantly or wait?  Cuz "time
sync" might product 0 sec every time even if there were something waiting to
be flushed to disk.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

Reply via email to