Running Solaris 10 Update 3 on an X4500 I have found that it is possible to reproducibly block all writes to a ZFS pool by running "chgrp -R" on any large filesystem in that pool. As can be seen below in the zpool iostat output below, after about 10-sec of running the chgrp command all writes to the pool stop, and the pool starts exclusively running a slow background task of 1kB reads.
At this point the chgrp -R command is not killable via root kill -9, and in fact even the command "halt -d" does not do anything. In at lest one instance I have seen the chgrp command eventually respond to the kill command after ~30 minutes, and the pool was writable again. However, while waiting for this to happen the kernel was generating "No more processes." when simple commands where attempted to be run in pre-existing shells, e.g., uname or uptime. # zpool iostat test 2 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- ... test 1.12T 19.2T 1 1.72K 11.2K 220M test 1.12T 19.2T 0 3.10K 0 380M test 1.12T 19.2T 0 335 0 41.9M test 1.12T 19.2T 0 4.49K 0 559M test 1.12T 19.2T 0 0 0 0 test 1.12T 19.2T 0 1.51K 0 193M test 1.12T 19.2T 0 3.31K 0 408M test 1.12T 19.2T 0 0 0 0 test 1.12T 19.2T 0 3.54K 0 453M test 1.13T 19.2T 428 1.17K 1.82M 129M *** Started chgrp -R *** test 1.13T 19.2T 1.74K 2.21K 7.19M 282M test 1.13T 19.2T 531 2.49K 2.34M 300M test 1.13T 19.2T 549 1.67K 2.96M 213M test 1.13T 19.2T 395 3.00K 2.38M 368M test 1.13T 19.2T 343 0 1.66M 0 test 1.13T 19.2T 113 0 113K 0 test 1.13T 19.2T 132 0 132K 0 test 1.13T 19.2T 136 0 137K 0 test 1.13T 19.2T 132 0 132K 0 test 1.13T 19.2T 148 0 149K 0 test 1.13T 19.2T 137 0 138K 0 test 1.13T 19.2T 163 0 163K 0 test 1.13T 19.2T 152 0 153K 0 ... *** All writes to this pool are hung for some long period of time. *** Here is the pool configuration: # zpool status pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t0d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 spares c8t1d0 AVAIL errors: No known data errors There is nothing in the output of dmesg, svcs -xv, or fmdump associated with this event. Is this a known issue or should I open a new case with Sun? Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss