Hi. snv_39, SPARC. I have several pools (no protection on ZFS) with several filesystems inside each pool. Data are served by nfsd (over 3000 active threads). Last time I changed it /etc/system:
set rpcmod:cotsmaxdupreqs=8192 set rpcmod:maxdupreqs=8192 And now I observer that every few hours nfsd is not issuing any IOs to most pools and all its threads (over 3000 right now - its set limit). Locally I can issue IOs to zfs filesystems without any problem. After 10-25 minutes the problem is gone on itself and then again back later. Most nfsd threads are hanging in ZFS. zpool iostat 1 nfs-s5-p0 2.45T 2.08T 0 0 0 0 nfs-s5-p1 1.60T 2.93T 0 0 0 0 nfs-s5-p2 1.39T 3.14T 0 0 0 0 nfs-s5-p3 41.2G 4.49T 0 0 0 0 nfs-s5-s8 4.40T 137G 0 7 0 255K ---------- ----- ----- ----- ----- ----- ----- mdb -kw > ::ps!grep nfsd R 320 1 320 320 1 0x42300902 0000030035d9a040 nfsd > 0000030035d9a040::walk thread|::findstack -v stack pointer for thread 30001305020: 2a100687021 [ 000002a100687021 cv_wait+0x40() ] 000002a1006870d1 exitlwps+0x11c(0, 200000, 42000002, 30035d9a040, 100000, 30035d9a106) 000002a100687181 proc_exit+0x1c(1, 0, ff131c80, 0, f, 18afe38) 000002a100687231 exit+8(1, 0, ff131c80, 0, f, ff3a2400) 000002a1006872e1 syscall_trap32+0xcc(0, 0, ff131c80, 0, f, ff3a2400) stack pointer for thread 3004138b920: 2a102176621 [ 000002a102176621 cv_wait+0x40() ] 000002a1021766d1 zil_commit+0x74(600012742ec, 26ca1, 10, 60001274280, 0, 26ca1) 000002a102176781 zfs_fsync+0xa8(0, 0, 3000081cf94, 0, 300d7fe1000, 0) 000002a102176831 fop_fsync+0x14(300d7fea040, 0, 300be5d1358, 3ade4e4, 0, 7ba3d40c) 000002a1021768e1 rfs3_remove+0x22c(2a102177198, 2a102177398, 0, 2a102177698, 300be5d1358, 2a102177220) 000002a102176ab1 common_dispatch+0x44c(2a102177698, 300c07cbdc0, 2a102177500, 6003298f200, 7017a1c0, 7bb9c7a8) 000002a102176dd1 svc_getreq+0x210(300c07cbdc0, 600096837c0, 6003269bc50, 300844084f8, 18feb90, 6003269bac0) 000002a102176f21 svc_run+0x194(60001125190, 0, 0, 1, 600011251c8, 30035d9a040) 000002a102176fd1 nfssys+0x1a4(e, ff0a1f9c, 7bb2f800, c, c, 1d0) 000002a1021772e1 syscall_trap32+0xcc(e, ff0a1f9c, 0, 0, 0, 0) stack pointer for thread 300be20d600: 2a101862621 [ 000002a101862621 cv_wait+0x40() ] 000002a1018626d1 zil_commit+0x74(300418bb62c, 26a38, 10, 300418bb5c0, 0, 26a38) 000002a101862781 zfs_fsync+0xa8(0, 0, 6000724d994, 0, 30060eb0010, 0) 000002a101862831 fop_fsync+0x14(30182c5fa00, 0, 300be5d0dd8, 3aee1e6, 0, 7ba3d40c) 000002a1018628e1 rfs3_remove+0x22c(2a101863198, 2a101863398, 0, 2a101863698, 300be5d0dd8, 2a101863220) 000002a101862ab1 common_dispatch+0x44c(2a101863698, 300c0892c80, 2a101863500, 6002fddb000, 7017a1c0, 7bb9c7a8) 000002a101862dd1 svc_getreq+0x210(300c0892c80, 6001966b0c0, 6002fe72750, c00, 18feb90, 6002fe725c0) 000002a101862f21 svc_run+0x194(60001125190, 0, 160, 1, 600011251c8, 30035d9a040) 000002a101862fd1 nfssys+0x1a4(e, fefe1f9c, 7bb2f800, c, c, 1d0) 000002a1018632e1 syscall_trap32+0xcc(e, fefe1f9c, 0, 0, 0, 0) stack pointer for thread 3000131b960: 2a1002ae621 [ 000002a1002ae621 cv_wait+0x40() ] 000002a1002ae6d1 zil_commit+0x74(6000574486c, 22049, 10, 60005744800, 0, 22049) 000002a1002ae781 zfs_fsync+0xa8(0, 0, 60005770d14, 0, 3007b62a648, 0) 000002a1002ae831 fop_fsync+0x14(3008549dd00, 0, 300416fc220, 3ad91ac, 0, 7ba3d40c) 000002a1002ae8e1 rfs3_remove+0x22c(2a1002af198, 2a1002af398, 0, 2a1002af698, 300416fc220, 2a1002af220) 000002a1002aeab1 common_dispatch+0x44c(2a1002af698, 600013a96c0, 2a1002af500, 300b7293180, 7017a1c0, 7bb9c7a8) 000002a1002aedd1 svc_getreq+0x210(600013a96c0, 6002e3d2980, 60021510710, 60003f5d0f8, 18feb90, 60021510580) 000002a1002aef21 svc_run+0x194(60001125190, 0, 0, 1, 600011251c8, 30035d9a040) 000002a1002aefd1 nfssys+0x1a4(e, fef31f9c, 7bb2f800, c, c, 1d0) 000002a1002af2e1 syscall_trap32+0xcc(e, fef31f9c, 0, 0, 0, 0) stack pointer for thread 300b38dd620: 2a1031f64e1 [ 000002a1031f64e1 cv_wait+0x40() ] 000002a1031f6591 zil_commit+0x74(300418bbdac, 27f84, 10, 300418bbd40, 39081e08, 27f84) 000002a1031f6641 zfs_fsync+0xa8(0, 10000, 30090ab13d4, 0, 3014523f8c8, 0) 000002a1031f66f1 fop_fsync+0x14(3009fde44c0, 10000, 60001002218, a, 39081e08, 7ba3d40c) 000002a1031f67a1 rfs3_create+0x7bc(2a1031f7500, 2a1031f7080, 1, 0, 60001002218, 2a1031f7220) 000002a1031f6ab1 common_dispatch+0x44c(2a1031f7698, 300bfd2ce40, 2a1031f7500, 300b70f2340, 7017a1c0, 7bb9b568) 000002a1031f6dd1 svc_getreq+0x210(300bfd2ce40, 60034dda0c0, 300ea327690, 1e3, 18feb90, 300ea327500) 000002a1031f6f21 svc_run+0x194(60001125190, 1, 0, 1, 600011251c8, 30035d9a040) 000002a1031f6fd1 nfssys+0x1a4(e, fedf1f9c, 7bb2f800, c, c, 1d0) 000002a1031f72e1 syscall_trap32+0xcc(e, fedf1f9c, 0, 0, 0, 0) [...] using mpstat I can see that one CPU is 100% utlized: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 86 214 114 0 0 0 0 0 0 0 1 0 99 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 2 0 0 0 37 0 73 0 0 0 0 1 0 0 0 100 3 0 0 3 12 0 22 0 1 1 0 0 0 0 0 100 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 5 0 0 2 10 0 18 0 0 0 0 0 0 0 0 100 6 0 0 1 3 0 4 0 0 1 0 0 0 0 0 100 7 0 0 3 9 0 20 0 0 0 0 0 0 1 0 99 8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 9 0 0 0 10 2 14 0 0 0 0 0 0 1 0 99 10 0 0 3 3 0 4 0 0 1 0 0 0 0 0 100 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 12 0 0 0 6 0 10 0 1 0 0 0 0 0 0 100 13 0 0 2 6 0 10 0 0 0 0 0 0 0 0 100 14 0 0 1 6 0 10 0 0 1 0 226 0 0 0 100 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 16 0 0 0 3 0 4 0 0 0 0 0 0 0 0 100 17 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 18 0 0 0 4 0 6 0 0 0 0 0 0 0 0 100 19 0 0 0 6 0 10 0 0 0 0 0 0 0 0 100 20 0 0 1 5 0 8 0 1 0 0 18 1 0 0 99 21 0 0 20 24 21 4 0 0 0 0 0 0 0 0 100 22 0 0 22 47 38 16 0 0 0 0 0 0 0 0 100 23 0 0 4 16 4 22 0 1 1 0 0 0 0 0 100 24 0 0 4 5 4 0 0 0 1 0 0 0 0 0 100 25 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 26 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 27 0 0 0 1 0 0 0 0 0 0 0 0 100 0 0 28 0 0 1 10 0 18 0 0 2 0 5 0 0 0 100 29 0 0 2 8 0 14 0 0 0 0 0 0 0 0 100 30 0 0 0 8 0 14 0 0 2 0 165 1 0 0 99 31 0 0 2 11 0 20 0 0 0 0 19 0 0 0 100 ^C bash-3.00# Well I wanted to play with dtrace but it has gone (CPU usage) and now system is almost 100% idle but still no IOs. Locally I can issue IOs to these zfs filesystems without any problem. Finally nfsd stopped (I issued kill -9 to nfsd - and it exited after... don't know 10-15 minutes). This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss