Re: [zfs-discuss] Periodic flush

Robert Milkowski Fri, 27 Jun 2008 20:15:44 -0700

Hello Mark,

Tuesday, April 15, 2008, 8:32:32 PM, you wrote:


MM> The new write throttle code put back into build 87 attempts to
MM> smooth out the process.  We now measure the amount of time it takes
MM> to sync each transaction group, and the amount of data in that group.
MM> We dynamically resize our write throttle to try to keep the sync
MM> time constant (at 5secs) under write load.  We also introduce
MM> "fairness" delays on writers when we near pipeline capacity: each
MM> write is delayed 1/100sec when we are about to "fill up".  This
MM> prevents a single heavy writer from "starving out" occasional
MM> writers.  So instead of coming to an abrupt halt when the pipeline
MM> fills, we slow down our write pace.  The result should be a constant
MM> even IO load.

snv_91, 48x 500GB sata drives in one large stripe:

# zpool create -f test c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 
c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0 c3t0d0 c3t1d0 c3t2d0 
c3t3d0 c3t4d0 c3t5d0 c3t6d0 c3t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 
c4t6d0 c4t7d0 c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 
c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0
# zfs set atime=off test


# dd if=/dev/zero of=/test/q1 bs=1024k
^C34374+0 records in
34374+0 records out


# zpool iostat 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
[...]
test        58.9M  21.7T      0  1.19K      0  80.8M
test         862M  21.7T      0  6.67K      0   776M
test        1.52G  21.7T      0  5.50K      0   689M
test        1.52G  21.7T      0  9.28K      0  1.16G
test        2.88G  21.7T      0  1.14K      0   135M
test        2.88G  21.7T      0  1.61K      0   206M
test        2.88G  21.7T      0  18.0K      0  2.24G
test        5.60G  21.7T      0     79      0   264K
test        5.60G  21.7T      0      0      0      0
test        5.60G  21.7T      0  10.9K      0  1.36G
test        9.59G  21.7T      0  7.09K      0   897M
test        9.59G  21.7T      0      0      0      0
test        9.59G  21.7T      0  6.33K      0   807M
test        9.59G  21.7T      0  17.9K      0  2.24G
test        13.6G  21.7T      0  1.96K      0   239M
test        13.6G  21.7T      0      0      0      0
test        13.6G  21.7T      0  11.9K      0  1.49G
test        17.6G  21.7T      0  9.91K      0  1.23G
test        17.6G  21.7T      0      0      0      0
test        17.6G  21.7T      0  5.48K      0   700M
test        17.6G  21.7T      0  20.0K      0  2.50G
test        21.6G  21.7T      0  2.03K      0   244M
test        21.6G  21.7T      0      0      0      0
test        21.6G  21.7T      0      0      0      0
test        21.6G  21.7T      0  4.03K      0   513M
test        21.6G  21.7T      0  23.7K      0  2.97G
test        25.6G  21.7T      0  1.83K      0   225M
test        25.6G  21.7T      0      0      0      0
test        25.6G  21.7T      0  13.9K      0  1.74G
test        29.6G  21.7T      1  1.40K   127K   167M
test        29.6G  21.7T      0      0      0      0
test        29.6G  21.7T      0  7.14K      0   912M
test        29.6G  21.7T      0  19.2K      0  2.40G
test        33.6G  21.7T      1    378   127K  34.8M
test        33.6G  21.7T      0      0      0      0
^C


Well, doesn't actually look good. Checking with iostat I don't see any
problems like long service times, etc.

Reducing zfs_txg_synctime to 1 helps a little bit but still it's not
even stream of data.

If I start 3 dd streams at the same time then it is slightly better
(zfs_txg_synctime set back to 5) but still very jumpy.

Reading with one dd produces steady throghput but I'm disapointed with
actual performance:

test         161G  21.6T  9.94K      0  1.24G      0
test         161G  21.6T  10.0K      0  1.25G      0
test         161G  21.6T  10.3K      0  1.29G      0
test         161G  21.6T  10.1K      0  1.27G      0
test         161G  21.6T  10.4K      0  1.31G      0
test         161G  21.6T  10.1K      0  1.27G      0
test         161G  21.6T  10.4K      0  1.30G      0
test         161G  21.6T  10.2K      0  1.27G      0
test         161G  21.6T  10.3K      0  1.29G      0
test         161G  21.6T  10.0K      0  1.25G      0
test         161G  21.6T  9.96K      0  1.24G      0
test         161G  21.6T  10.6K      0  1.33G      0
test         161G  21.6T  10.1K      0  1.26G      0
test         161G  21.6T  10.2K      0  1.27G      0
test         161G  21.6T  10.4K      0  1.30G      0
test         161G  21.6T  9.62K      0  1.20G      0
test         161G  21.6T  8.22K      0  1.03G      0
test         161G  21.6T  9.61K      0  1.20G      0
test         161G  21.6T  10.2K      0  1.28G      0
test         161G  21.6T  9.12K      0  1.14G      0
test         161G  21.6T  9.96K      0  1.25G      0
test         161G  21.6T  9.72K      0  1.22G      0
test         161G  21.6T  10.6K      0  1.32G      0
test         161G  21.6T  9.93K      0  1.24G      0
test         161G  21.6T  9.94K      0  1.24G      0


zpool scrub produces:

test         161G  21.6T     25     69  2.70M   392K
test         161G  21.6T  10.9K      0  1.35G      0
test         161G  21.6T  13.4K      0  1.66G      0
test         161G  21.6T  13.2K      0  1.63G      0
test         161G  21.6T  11.8K      0  1.46G      0
test         161G  21.6T  13.8K      0  1.72G      0
test         161G  21.6T  12.4K      0  1.53G      0
test         161G  21.6T  12.9K      0  1.59G      0
test         161G  21.6T  12.9K      0  1.59G      0
test         161G  21.6T  13.4K      0  1.67G      0
test         161G  21.6T  12.2K      0  1.51G      0
test         161G  21.6T  12.9K      0  1.59G      0
test         161G  21.6T  12.5K      0  1.55G      0
test         161G  21.6T  13.3K      0  1.64G      0




So sequential reading gives steady thruput but numbers are a little
bit lower than expected.

Sequential writing is still jumpy with single or multiple dd streams
for pool with many disk drives.

Lets destroy the pool and create a new one, smaller one.



# zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0
# zfs set atime=off test

# dd if=/dev/zero of=/test/q1 bs=1024k
^C15905+0 records in
15905+0 records out


# zpool iostat 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
[...]
test         688M  2.72T      0  3.29K      0   401M
test        1.01G  2.72T      0  3.69K      0   462M
test        1.35G  2.72T      0  3.59K      0   450M
test        1.35G  2.72T      0  2.95K      0   372M
test        2.03G  2.72T      0  3.37K      0   428M
test        2.03G  2.72T      0  1.94K      0   248M
test        2.71G  2.72T      0  2.44K      0   301M
test        2.71G  2.72T      0  3.88K      0   497M
test        2.71G  2.72T      0  3.86K      0   494M
test        4.07G  2.71T      0  3.42K      0   425M
test        4.07G  2.71T      0  3.89K      0   498M
test        4.07G  2.71T      0  3.88K      0   497M
test        5.43G  2.71T      0  3.44K      0   429M
test        5.43G  2.71T      0  3.94K      0   504M
test        5.43G  2.71T      0  3.88K      0   497M
test        5.43G  2.71T      0  3.88K      0   497M
test        7.62G  2.71T      0  2.34K      0   286M
test        7.62G  2.71T      0  4.23K      0   539M
test        7.62G  2.71T      0  3.89K      0   498M
test        7.62G  2.71T      0  3.87K      0   495M
test        7.62G  2.71T      0  3.88K      0   497M
test        9.81G  2.71T      0  3.33K      0   418M
test        9.81G  2.71T      0  4.12K      0   526M
test        9.81G  2.71T      0  3.88K      0   497M


Much more steady - interesting.


Let's do it again with yet bigger pool and lets keep distributing
disks in "rows" across controllers.

# zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c1t1d0 c2t1d0 
c3t1d0 c4t1d0 c5t1d0 c6t1d0
# zfs set atime=off test

test        1.35G  5.44T      0  5.42K      0   671M
test        2.03G  5.44T      0  7.01K      0   883M
test        2.71G  5.43T      0  6.22K      0   786M
test        2.71G  5.43T      0  8.09K      0  1.01G
test        4.07G  5.43T      0  7.14K      0   902M
test        5.43G  5.43T      0  4.02K      0   507M
test        5.43G  5.43T      0  5.52K      0   700M
test        5.43G  5.43T      0  8.04K      0  1.00G
test        5.43G  5.43T      0  7.70K      0   986M
test        8.15G  5.43T      0  6.13K      0   769M
test        8.15G  5.43T      0  7.77K      0   995M
test        8.15G  5.43T      0  7.67K      0   981M
test        10.9G  5.43T      0  4.15K      0   517M
test        10.9G  5.43T      0  7.74K      0   986M
test        10.9G  5.43T      0  7.76K      0   994M
test        10.9G  5.43T      0  7.75K      0   993M
test        14.9G  5.42T      0  6.79K      0   860M
test        14.9G  5.42T      0  7.50K      0   958M
test        14.9G  5.42T      0  8.25K      0  1.03G
test        14.9G  5.42T      0  7.77K      0   995M
test        18.9G  5.42T      0  4.86K      0   614M


starting to be more jumpy, but still not as bad as in first case.

So lets create a pool out of all disks again but this time lets
continue to provide disks in "rows" across controllers.

# zpool create -f test c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c1t1d0 c2t1d0 
c3t1d0 c4t1d0 c5t1d0 c6t1d0 c1t2d0 c2t2d0 c3t2d0 c4t2d0 c5t2d0 c6t2d0 c1t3d0 
c2t3d0 c3t3d0 c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 c3t4d0 c4t4d0 c5t4d0 c6t4d0 
c1t5d0 c2t5d0 c3t5d0 c4t5d0 c5t5d0 c6t5d0 c1t6d0 c2t6d0 c3t6d0 c4t6d0 c5t6d0 
c6t6d0 c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 c6t7d0
# zfs set atime=off test

test         862M  21.7T      0  5.81K      0   689M
test        1.52G  21.7T      0  5.50K      0   689M
test        2.88G  21.7T      0  10.9K      0  1.35G
test        2.88G  21.7T      0      0      0      0
test        2.88G  21.7T      0  9.49K      0  1.18G
test        5.60G  21.7T      0  11.1K      0  1.38G
test        5.60G  21.7T      0      0      0      0
test        5.60G  21.7T      0      0      0      0
test        5.60G  21.7T      0  15.3K      0  1.90G
test        9.59G  21.7T      0  15.4K      0  1.91G
test        9.59G  21.7T      0      0      0      0
test        9.59G  21.7T      0      0      0      0
test        9.59G  21.7T      0  16.8K      0  2.09G
test        13.6G  21.7T      0  8.60K      0  1.06G
test        13.6G  21.7T      0      0      0      0
test        13.6G  21.7T      0  4.01K      0   512M
test        13.6G  21.7T      0  20.2K      0  2.52G
test        17.6G  21.7T      0  2.86K      0   353M
test        17.6G  21.7T      0      0      0      0
test        17.6G  21.7T      0  11.6K      0  1.45G
test        21.6G  21.7T      0  14.1K      0  1.75G
test        21.6G  21.7T      0      0      0      0
test        21.6G  21.7T      0      0      0      0
test        21.6G  21.7T      0  4.74K      0   602M
test        21.6G  21.7T      0  17.6K      0  2.20G
test        25.6G  21.7T      0  8.00K      0  1008M
test        25.6G  21.7T      0      0      0      0
test        25.6G  21.7T      0      0      0      0
test        25.6G  21.7T      0  16.8K      0  2.09G
test        25.6G  21.7T      0  15.0K      0  1.86G
test        29.6G  21.7T      0     11      0  11.9K



Any idea?



-- 
Best regards,
 Robert Milkowski                           mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Periodic flush

Reply via email to