Re: [zfs-discuss] Digging in the bowels of ZFS

Jim Klimov Wed, 05 Dec 2012 17:09:08 -0800

On 2012-12-05 05:52, Jim Klimov wrote:

For undersized allocations, i.e. of compressed data, it is possible
to see P-sizes not divisible by 4 (disks) in 4KB sectors, however,
some sectors do apparently get wasted because the A-size in the DVA
is divisible by 6*4KB. With columnar allocation of disks, it is
easier to see why full stripes have to be used:


p1 p2 d1 d2 d3 d4
.  ,  1  5  9   13
.  ,  2  6  10  14
.  ,  3  7  11  x
.  ,  4  8  12  x

In this illustration a 14-sector-long block is saved, with X being
the empty leftovers, on which we can't really save (as would be the
case with the other allocation, which is likely less efficient for
CPU and IOs).


Getting more and more puzzled with this... I have seen DVA values
matching both theories now...

Interestingly, all the allocations I looked over involved the number
of sectors divisible by 3... rounding to half of my 6-disk RAID set -
is it merely a coincidence, or some means of balancing IOs?

Anyhow, with 4KB sectors involved, I saw many 128KB logical blocks
compressed into just half a dozen sectors of userdata payload, so
wasting one or two sectors here is quite a large percentage of my
storage overhead.

Exposition of found evidence follows:


Say, this one from my original post:
DVA[0]=<0:594928b8000:9000> ... size=20000L/4800P

It has 5 data sectors (@4Kb) over 4 data disks in my raidz2 set,
so it spills over to a second row and requires additional parity
sectors - overall 5d+4p = 9 sectors, which we see in DVA A-size.
This is normal, like expected.

These ones however differ:

DVA[0]=<0:acef500e000:c000> ... size=20000L/6a00P
DVA[0]=<0:acef501a000:c000> ... size=20000L/7200P
DVA[0]=<0:acef5026000:c000> ... size=20000L/5c00P

These neighbors, with 7, 8 and 6 sectors worth of data all occupy
12 sectors on disk along with their parities.



DVA[0]=<0:59492a92000:6000> ... size=20000L/2800P

With 3*4Kb sectors worth of data and 2 parity sectors, this block
is allocated over 6 not 5 sectors.


DVA[0]=<0:5996bf7c000:12000> ... size=20000L/a800P

Likewise, with 11 sectors of data and likely 6 sectors of parity,
this one is given 18, not 17 sectors of storage allocation.



DVA[0]=<0:5996be32000:1e000> ... size=20000L/12c00P

Here, 19 sectors of data and 10 of parity occupy 30 sectors on disk.


I did not yet research where exactly the "unused" sectors are
allocated - "vertically" on the last strip, like in my yesterdays
depiction quoted above, or "horizontally" across several disks,
but now that I know about this - it really bothers me as wasted
space with no apparent gain. I mean, the raidz code does tricks
to ensure that parities are located on different disks, and in
normal conditions the userdata sector reads land on all disks
in a uniform manner. Why forfeit the natural "rotation" thanks
to P-sizes smaller than the multiple of number of data-disks?
Writes are anyway streamed and coalesced, so by not allocating
these unused blocks we'd only reduce the needed write IOPS by
some portion - and save disk space...


In short: can someone explain the rationale - why are allocations
such as they are now, and can it be discussed as a bug or should
this be rationalized as a feature?

Thanks,
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Digging in the bowels of ZFS

Reply via email to