> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Neil Perrin
> 
> No, that's not true. The DDT is just like any other ZFS metadata and can
be
> split over the ARC,
> cache device (L2ARC) and the main pool devices. An infrequently referenced
> DDT block will get
> evicted from the ARC to the L2ARC then evicted from the L2ARC.

When somebody has their "baseline" system, and they're thinking about adding
dedup and/or cache, I'd like to understand the effect of not having enough
ram.  Obviously the impact will be performance, but precisely...

At bootup, I presume the arc & l2arc are all empty.  So all the DDT entries
reside in pool.  As the system reads things (anything, files etc) from pool,
it will populate arc, and follow fill rate policies to populate the l2arc
over time.  Every entry in l2arc requires 200 bytes of arc, regardless of
what type of entry it is.  (A DDT entry in l2arc consumes just as much arc
memory as any other type of l2arc entry.)  (Ummm...  What's the point of
that?  Aren't DDT entries 270 bytes and ARC references 200 bytes?  Seems
like a very questionable benefit to allow DDT entries to get evicted into
L2ARC.)  So the ram consumption caused by the presence of l2arc will
initially be zero after bootup, and it will grow over time as the l2arc
populates, up to a maximum which is determined linearly as 200 bytes * the
number of entries that can fit in the l2arc.  Of course that number varies
based on the size of each entry and size of l2arc, but at least you can
estimate and establish upper and lower bounds.

So that's how the l2arc consumes system memory in arc.  The penalty of
insufficient ram, in conjunction with enabled L2ARC, is insufficient arc
availability for other purposes - Maybe the whole arc is consumed by l2arc
entries, and so the arc doesn't have any room for other stuff like commonly
used files.  Worse yet, your arc consumption could be so large, that
PROCESSES don't fit in ram anymore.  In this case, your processes get pushed
out to swap space, which is really bad.

Correct me if I'm wrong, but the dedup sha256 checksum happens in addition
to (not instead of) the fletcher2 integrity checksum.  So after bootup,
while the system is reading a bunch of data from the pool, all those reads
are not populating the arc/l2arc with DDT entries.  Reads are just
populating the arc and l2arc with other stuff.

DDT entries don't get into the arc/l2arc until something tries to do a
write.  When performing a write, dedup calculates the checksum of the block
to be written, and then it needs to figure out if that's a duplicate of
another block that's already on disk somewhere.  So (I guess this part)
there's probably a tree-structure (I'll use the subdirectories and files
analogy even though I'm certain that's not technically correct) on disk.
You need to find the DDT entry, if it exists, for the block whose checksum
is 1234ABCD.  So you start by looking under the 1 directory, and from there
look for the 2 subdirectory, and then the 3 subdirectory, [...etc...] If you
encounter "not found" at any step, then the DDT entry doesn't already exist
and you decide to create a new one.  But if you get all the way down to the
C subdirectory and it contains a file named "D,"  then you have found a
possible dedup hit - the checksum matched another block that's already on
disk.  Now the DDT entry is stored in ARC just like anything else you read
from disk.

So the point is - Whenever you do a write, and the calculated DDT is not
already in ARC/L2ARC, the system will actually perform several small reads
looking for the DDT entry before it finally knows that the DDT entry
actually exists.  So the penalty of performing a write, with dedup enabled,
and the relevant DDT entry not already in ARC/L2ARC is a very large penalty.
What originated as a single write quickly became several small reads plus a
write, due to the fact the necessary DDT entry was not already available.

The penalty of insufficient ram, in conjunction with dedup, is terrible
write performance.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to