You might consider using processor sets and psradm to segragate the cpus handling network interrupts. That should allow you to instrument further (dtrace/guds/sar) - though once
you've freed up some cycles from constantly handling interrupts; you may be done.


Steve Gonczi wrote:
I have a fast, 8 core x 2 hyperthread  Nehalem system, 48 gigs of memory. During network throughput testing ( multiple ftp server instances running,  transferring 200 megabytes /sec on 2x GIG-E interfaces)  the system periodically becomes extremely sluggish.  (I can barely type in commands from the console, and network throughput drops down to nothing).  Once the system gets into this state it will not recover unless we kill the network load. 

I see run queue values 21 10 5, several 100 mf-s in vmsstat, but otherwise little activity.
Occasionally I see 100% kernel/system activity for seconds at a time.

Trying to figure out who is using cpu in kernel via dtrace profiling scripts for 10 secs at a time:
e.g.: dtrace -n 'profile:::profile-3456 /arg0/ { @[stack(1)] = count(); }' 
but I get the dtrace watchdog abort "Abort due to systemic unresponsiveness"
Tried to force the script to run via -w, and I just see a very low count 
( 1 - 2 max) of seemingly random functions sampled.

I wonder if there is perhaps a hardware issue, that prevents the dtrace sampling interrupts from being run.   I tried to see what is going on with a variety of other tools (all the various *stat commands) but fail to see anything obvious, other than the run queue and the occasional 100% kernel.  I typically see an almost idle system, no lock contention, no io wait, low system call, context switch and stmx counts. 
  
Any suggestions regarding what tool /dtrace script to use, or where to look to get to the bottom of the sluggishness would be much appreciated.   

TIA

Steve
  


--
Oracle Logo
John Higgins | Principal Support Engineer
Phone: 858.449.5087
Oracle Global Customer Services, North America
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to