> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Neil Perrin > > No, that's not true. The DDT is just like any other ZFS metadata and can be > split over the ARC, > cache device (L2ARC) and the main pool devices. An infrequently referenced > DDT block will get > evicted from the ARC to the L2ARC then evicted from the L2ARC.
When somebody has their "baseline" system, and they're thinking about adding dedup and/or cache, I'd like to understand the effect of not having enough ram. Obviously the impact will be performance, but precisely... At bootup, I presume the arc & l2arc are all empty. So all the DDT entries reside in pool. As the system reads things (anything, files etc) from pool, it will populate arc, and follow fill rate policies to populate the l2arc over time. Every entry in l2arc requires 200 bytes of arc, regardless of what type of entry it is. (A DDT entry in l2arc consumes just as much arc memory as any other type of l2arc entry.) (Ummm... What's the point of that? Aren't DDT entries 270 bytes and ARC references 200 bytes? Seems like a very questionable benefit to allow DDT entries to get evicted into L2ARC.) So the ram consumption caused by the presence of l2arc will initially be zero after bootup, and it will grow over time as the l2arc populates, up to a maximum which is determined linearly as 200 bytes * the number of entries that can fit in the l2arc. Of course that number varies based on the size of each entry and size of l2arc, but at least you can estimate and establish upper and lower bounds. So that's how the l2arc consumes system memory in arc. The penalty of insufficient ram, in conjunction with enabled L2ARC, is insufficient arc availability for other purposes - Maybe the whole arc is consumed by l2arc entries, and so the arc doesn't have any room for other stuff like commonly used files. Worse yet, your arc consumption could be so large, that PROCESSES don't fit in ram anymore. In this case, your processes get pushed out to swap space, which is really bad. Correct me if I'm wrong, but the dedup sha256 checksum happens in addition to (not instead of) the fletcher2 integrity checksum. So after bootup, while the system is reading a bunch of data from the pool, all those reads are not populating the arc/l2arc with DDT entries. Reads are just populating the arc and l2arc with other stuff. DDT entries don't get into the arc/l2arc until something tries to do a write. When performing a write, dedup calculates the checksum of the block to be written, and then it needs to figure out if that's a duplicate of another block that's already on disk somewhere. So (I guess this part) there's probably a tree-structure (I'll use the subdirectories and files analogy even though I'm certain that's not technically correct) on disk. You need to find the DDT entry, if it exists, for the block whose checksum is 1234ABCD. So you start by looking under the 1 directory, and from there look for the 2 subdirectory, and then the 3 subdirectory, [...etc...] If you encounter "not found" at any step, then the DDT entry doesn't already exist and you decide to create a new one. But if you get all the way down to the C subdirectory and it contains a file named "D," then you have found a possible dedup hit - the checksum matched another block that's already on disk. Now the DDT entry is stored in ARC just like anything else you read from disk. So the point is - Whenever you do a write, and the calculated DDT is not already in ARC/L2ARC, the system will actually perform several small reads looking for the DDT entry before it finally knows that the DDT entry actually exists. So the penalty of performing a write, with dedup enabled, and the relevant DDT entry not already in ARC/L2ARC is a very large penalty. What originated as a single write quickly became several small reads plus a write, due to the fact the necessary DDT entry was not already available. The penalty of insufficient ram, in conjunction with dedup, is terrible write performance. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss