Hi,

I've got a system that's behaing a bit odd. It's running a classic
network service that's got one parent proc and spawns one child proc /
connection. It's fine with about 100 or so concurrent child procs, but
once it starts hitting a higher number <defunt> procs start appearing.
Up to about 300 or so, the <defunt> procs appear and disappear so fast
that trying to preap them will only preap a couple of them with the rest
not reaching 60 seconds before going away. Lately it's been getting to
the point where there's more <defunt> procs and not much work getting
done. Setting a higher limit for the number of child procs helped moving
a bit more through, but that's also left me with something in the order
of 600 <defunt> procs. First thought was running out of resources, but
the load stays down around 5 (4 proc box) and there's plenty of free
memory. The traffic only runs at about 40-50 KBits/sec.
Running hotkernel from dtrace toolkit, I get something like:

FUNCTION                                                COUNT   PCNT
unix`default_lock_delay                                  2105   0.7%
unix`mutex_exit                                          2408   0.8%
unix`generic_idle_cpu                                    2527   0.8%
unix`page_vpsub                                          2545   0.8%
unix`page_unlock                                         3245   1.0%
genunix`pvn_vplist_dirty                                 6644   2.1%
unix`mutex_delay_default                                 9892   3.2%
unix`page_lock_es                                       10404   3.4%
unix`mutex_enter                                        20079   6.5%
unix`page_trylock                                       21461   6.9%
unix`idle                                               22421   7.3%
unix`page_vpadd                                         31324  10.1%
unix`disp_getwork                                       96444  31.2%

Checking for failing syscalls (errinfo) I get things like:

          SYSCALL  ERR  COUNT  DESC
         shutdown  134     13  Socket is not connected
          pollsys    4    694  interrupted system call
          c2audit   22   1428  Invalid argument
            ioctl   25   2181  Inappropriate ioctl for device
           open64    2   3090  No such file or directory
           putmsg    9   4600  Bad file number
            fcntl   22   6618  Invalid argument
           stat64    2   9530  No such file or directory
          lstat64    2  11892  No such file or directory
            close    9  12871  Bad file number
             stat    2  61254  No such file or directory

the number of failed stats is a bit high, but they're expected because
there's some silliness checking for .files. 

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  467   0 1535   185    5 2402  159  343 1404   11  5153    7  53   0  40
  1  455   0 1978  3140 3027 1874  132  307 1064    9  5898    9  53   0  38
  2  520   0 1314   837  693 2112  116  291 1288   10  4442    7  56   0  37
  3  533   0 1573  1047  767 3273  189  410 1281   11  4950    7  45   0  47

I'm a bit short on ideas of where to go digging next, so any hints/ideas would
be greatly appriated.

thanks,

/Mads
-- 
http://soulfood.dk
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to