[zfs-discuss] chgrp -R hangs all writes to pool

Stuart Anderson Mon, 16 Jul 2007 21:36:20 -0700

Running Solaris 10 Update 3 on an X4500 I have found that it is possible
to reproducibly block all writes to a ZFS pool by running "chgrp -R"
on any large filesystem in that pool.  As can be seen below in the zpool
iostat output below, after about 10-sec of running the chgrp command all
writes to the pool stop, and the pool starts exclusively running a slow
background task of 1kB reads.


At this point the chgrp -R command is not killable via root kill -9,
and in fact even the command "halt -d" does not do anything.

In at lest one instance I have seen the chgrp command eventually
respond to the kill command after ~30 minutes, and the pool was
writable again. However, while waiting for this to happen the
kernel was generating "No more processes." when simple commands
where attempted to be run in pre-existing shells, e.g., uname or uptime.


# zpool iostat test 2
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
...
test        1.12T  19.2T      1  1.72K  11.2K   220M
test        1.12T  19.2T      0  3.10K      0   380M
test        1.12T  19.2T      0    335      0  41.9M
test        1.12T  19.2T      0  4.49K      0   559M
test        1.12T  19.2T      0      0      0      0
test        1.12T  19.2T      0  1.51K      0   193M
test        1.12T  19.2T      0  3.31K      0   408M
test        1.12T  19.2T      0      0      0      0
test        1.12T  19.2T      0  3.54K      0   453M
test        1.13T  19.2T    428  1.17K  1.82M   129M
*** Started chgrp -R ***
test        1.13T  19.2T  1.74K  2.21K  7.19M   282M
test        1.13T  19.2T    531  2.49K  2.34M   300M
test        1.13T  19.2T    549  1.67K  2.96M   213M
test        1.13T  19.2T    395  3.00K  2.38M   368M
test        1.13T  19.2T    343      0  1.66M      0
test        1.13T  19.2T    113      0   113K      0
test        1.13T  19.2T    132      0   132K      0
test        1.13T  19.2T    136      0   137K      0
test        1.13T  19.2T    132      0   132K      0
test        1.13T  19.2T    148      0   149K      0
test        1.13T  19.2T    137      0   138K      0
test        1.13T  19.2T    163      0   163K      0
test        1.13T  19.2T    152      0   153K      0
...
*** All writes to this pool are hung for some long period of time. ***


Here is the pool configuration:

# zpool status
  pool: test
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c0t7d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0
            c8t3d0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
            c8t5d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c8t4d0  ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
            c7t7d0  ONLINE       0     0     0
            c8t7d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
            c7t6d0  ONLINE       0     0     0
            c8t6d0  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
        spares
          c8t1d0    AVAIL   

errors: No known data errors


There is nothing in the output of dmesg, svcs -xv, or fmdump associated
with this event.

Is this a known issue or should I open a new case with Sun?


Thanks.


-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] chgrp -R hangs all writes to pool

Reply via email to