
I have a Solaris 10u3/x86 box with a single mirrored zpool, patched with 
10_Recommended as of mid-May and which has been running with no obvious 
problems since that time until today.

Today processes accessing certain zfs files starting hanging (sleeping in an 
unkillable state), which seems to have started when someone tried a simple 
"cat file | gunzip > otherfile" pipeline where "file" happens to be quite 
large (23G).  No obvious errors in the systems logs, dmesg, iostat -e or 
zpool status and a precautionary "zpool scrub" *also* appears to have hung 
without triggering the actual scrub. Also I managed to create a further hung 
cat process with "truss cat file > /dev/null", it hanging after the file was 
mmap64()'d and before the first read() returned.

wchan and kernel stack was the same for all the "cat" processes, so it would 
appear they were sleeping in the loop at 

(or its 10u3 equivalent)

# ps -o pid,wchan,args -p 9335,9561,9427,9511,9532,9564
  9335 d3dfe0b2 ls -lart
  9427 dce69114 cat administration-2007-01-10.tgz
  9511 dce69114 cat administration-2007-01-10.tgz
  9532 dce69114 cat administration-2007-01-10.tgz
  9561 d8bad584 ls scratch/
  9564 fec63e46 zpool scrub data

# mdb -k
 > ::pgrep cat | ::walk thread | ::findstack
stack pointer for thread d7b43e00: d50c3d6c
   d50c3d84 swtch+0x13e()
   d50c3d90 cv_wait+0x4b()
   d50c3dd8 dmu_buf_hold_array_by_dnode+0x236()
   d50c3e04 dmu_buf_hold_array_by_bonus+0x27()
   d50c3e74 zfs_read+0x182()
   d50c3eac fop_read+0x2a()
   d50c3f84 read+0x1f9()
   d50c3fac sys_sysenter+0x100()

Rebooting the system appears to have solved the problem (ie the truss, cat 
and gunzip commands above work just fine).  I do have a crash dump if anyone 
is interested, but this particular server is not under support.

zfs-discuss mailing list

Reply via email to