Marc Fournier wrote: > On 2013-02-13, at 3:54 PM, Rick Macklem <rmack...@uoguelph.ca> wrote: > > >> > > The pid that is in "T" state for the "ps auxlH". > > Different server, last kernel update on Jan 22nd, https process this > time instead of du last time. > > I've attached: > > ps auxlH > ps auxlH of just the processes that are in TJ state (6 httpd servers) > procstat output for each of the 6 process > > > > > They are included as attachments … if these don't make it through, let > me know, just figured I'd try and keep it compact ... Ok, I took a look and the interesting process seems to be 16693. It is stopped ("T" state) and several of its threads (22, but not all) have a procstat like this: 16693 104135 httpd - mi_switch+0x186 thread_suspend_check+0x19f sleepq_catch_signals+0x1c5 sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763 clnt_reconnect_call+0xfb newnfs_request+0xadb nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56 nfs_access+0x306 vn_open_cred+0x5a8 kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7
The sleep in clnt_vc_call is waiting for an RPC reply (while a vnode lock is held) with PCATCH | PBDRY flags, since it interruptible. I can see that the thread_suspend_check() has a 1 argument (return_instead == 1), since there is only one call to thread_suspend_check() in sleepq_catch_signals(). When looking at thread_suspend_check(), I basically got lost, although it seems that it can only "return_instead" if there is a single thread and not multiple threads doing this. If these threads are stuck here and won't return from msleep(), that would explain the hang. If they would wakeup and return from the msleep() when a wakeup occurs, it would suggest that there is a lost reply or similar, so the wakeup isn't occurring. I also don't know if a timeout of the msleep() will still occur and make the msleep() return? Although it wasn't done to fix this, it looks like jhb@'s recent patch to head (r246417) might fix this, since it reworks how STOP signals are handled for interruptible mounts. Hopefully kib or jhb can provide more insight. Btw Marc, if you just want this problem to go away, I suspect getting rid of the "intr" mount option would do that. rick _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"