Temporary random kernel hang

seven Fri, 08 Dec 2006 02:40:31 -0800

Hello,

I have some trouble with a multithreaded java network server running on
SLES10. At random times I see the kernel take 80% of the CPU leaving iddle
to 0% for 30 seconds. After this period the system returns to normal
operation state.


Below is a vmstat -a 3 recording that shows the problem:

 1  0      0 773068 529184 693048    0    0     0     0  272  201  0  0 100 
0  0
 0  0      0 773068 529184 693064    0    0     0    25  317  334  1  0 99 
1  0
 0  0      0 772944 529216 693248    0    0     0    24  477 1017  3  0 96 
0  0
 0  0      0 772820 529256 693316    0    0     0     0  525 1376  4  1 95 
0  0
 0  0      0 772448 529344 693636    0    0     0   107 1098 3306 11  2 86 
0  0
 0  0      0 772324 529404 693456    0    0     0     0  723 2247  7  2 91 
0  0
 0  0      0 772076 529496 693656    0    0     0   132  770 2488  7  2 91 
1  0
 0  0      0 772200 529528 693608    0    0     0    91  528 1168  4  1 94 
1  0
 0  0      0 772200 529532 693728    0    0     0     0  334  387  1  0 99 
0  0
 0  0      0 772076 529568 693680    0    0     0    24  564 1250  4  1 95 
0  0
 0  0      0 771828 529636 693784    0    0     0     0  787 2144  7  2 91 
0  0
 0  0      0 771580 529744 694232    0    0     0   111  995 3081 11  2 86 
1  0
107  0      0 771316 529792 694904    0    0     0   153  829 1650 12 37 51 
0  0
113  0      0 771316 529792 694912    0    0     0     0  323  169 15 85  0 
0  0
116  0      0 771216 529792 694728    0    0     0    25  292  190 14 86  0 
0  0
122  0      0 771340 529792 694728    0    0     0    21  311  191 15 85  0 
0  0
138  0      0 771464 529792 694728    0    0     0     0  365  196 14 86  0 
0  0
146  0      0 771464 529792 694728    0    0     0     0  331  189 16 84  0 
0  0
150  0      0 771472 529792 694728    0    0     0     0  336  183 15 85  0 
0  0
146  0      0 771472 529792 694728    0    0     0     4  310  201 14 86  0 
0  0
145  0      0 771472 529792 694728    0    0     0     0  285  163 15 85  0 
0  0
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id
wa st
146  0      0 771472 529792 694728    0    0     0     0  277  159 14 86  0 
0  0
145  0      0 771472 529792 694728    0    0     0    32  275  133 15 85  0 
0  0
 0  0      0 771208 529892 694176    0    0     0     0 1012 3408 12  4 84 
0  0
 0  0      0 770712 529972 694488    0    0     0   149  774 2869  8  2 90 
0  0
 0  0      0 770712 529972 694488    0    0     0     0  271  195  0  0 100 
0  0
 0  0      0 770728 529972 694488    0    0     0    35  269  167  0  0 100 
1  0
 0  0      0 770728 529972 694488    0    0     0     7  269  189  0  0 100 
0  0

The application is memory stable ( no leaks ) and a deadlock is out of the
question since in a deadlock case the system would freeze forever and not
temporarily. There are around 200 - 250 tcp/ip clients connected to the
application and 550 threads ( streaming blocking sockets are used so every
client is managed by one reading thread and one writing thread)

The same application works fine on SLES9.3

Hanging Evironment:
-----------------------------------------------------------------------------
mustang:~ # uname -a
Linux mustang 2.6.16.21-0.25-smp #1 SMP Tue Sep 19 07:26:15 UTC 2006 x86_64
x86_64 x86_64 GNU/Linux
mustang:~ # java -version
java version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
Java HotSpot(TM) Server VM (build 1.6.0-rc-b104, mixed mode)
mustang:~ # cat /etc/SuSE-release
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
-----------------------------------------------------------------------------

Working environment:
-----------------------------------------------------------------------------
apollo:~ # uname -a
Linux apollo 2.6.5-7.252-smp #1 SMP Tue Feb 14 11:11:04 UTC 2006 x86_64
x86_64 x86_64 GNU/Linux
apollo:~ # java -version
java version "1.6.0-rc"
Java(TM) SE Runtime Environment (build 1.6.0-rc-b95)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b95, mixed mode)
apollo:~ # cat /etc/SuSE-release
SUSE LINUX Enterprise Server 9 (x86_64)
VERSION = 9
PATCHLEVEL = 3
-----------------------------------------------------------------------------

Can you give me some pointers about where to start debugging this issue?

Regards,
Horia
-- 
View this message in context: 
http://www.nabble.com/Temporary-random-kernel-hang-tf2779860.html#a7755634
Sent from the linux-kernel mailing list archive at Nabble.com.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Temporary random kernel hang

Reply via email to