Package: base
Severity: important

I have a serious problem with debian squeeze servers on virtualized environment 
(XenServer 5.6).
Frequent kernel errors appears on /var/log/messsages or dmesg and informs me 
that there are processes blocked.
When the process is apache2, seems, it remains in "D" (Uninterruptible sleep) 
status and don't respond at http requests. Also the virtual machine has an high 
CPU iowait.
This VM is part of an ocfs2 cluster with 8 nodes but the problem also occurs 
outside the ocfs2 cluster (for example with an ext3 filesystem).
Every virtual machine is in his own dedicated volume on Equallogic iSCSI SAN.   

The errors are like these:

[3759012.889335] INFO: task java:14127 blocked for more than 120 seconds.
[3759012.889341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[3759012.889345] java          D ffff8800da913f90     0 14127      1 0x00000000
[3759012.889348]  ffff8800da913f90 0000000000000286 ffff8800fdb45448 
0000000000001be6
[3759012.889352]  0000000000000000 ffffffff8100e22f 000000000000f9e0 
ffff88007b86bfd8
[3759012.889356]  0000000000015780 0000000000015780 ffff8800e5541530 
ffff8800e5541828
[3759012.889360] Call Trace:
[3759012.889362]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3759012.889369]  [<ffffffffa0142a67>] ? ocfs2_inode_unlock+0x144/0x1aa [ocfs2]
[3759012.889372]  [<ffffffff811531a6>] ? cap_inode_permission+0x0/0x3
[3759012.889374]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[3759012.889377]  [<ffffffff810e71b1>] ? kmem_cache_alloc+0x8c/0xf0
[3759012.889380]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[3759012.889382]  [<ffffffff810f89fd>] ? do_filp_open+0x226/0x94b
[3759012.889385]  [<ffffffff8100c2c2>] ? pfn_to_mfn+0xe/0x22
[3759012.889388]  [<ffffffff8100c33c>] ? pte_pfn_to_mfn+0x21/0x30
[3759012.889391]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3759012.889393]  [<ffffffff81101d0d>] ? alloc_fd+0x67/0x10c
[3759012.889396]  [<ffffffff810ed1eb>] ? do_sys_open+0x55/0xfc
[3759012.889398]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[3804012.893213] INFO: task java:6498 blocked for more than 120 seconds.
[3804012.893228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[3804012.893237] java          D ffff8800038f7c40     0  6498      1 0x00000000
[3804012.893245]  ffff880002d8dbd0 0000000000000286 ffff8800ac4f9d38 
ffff8800ac4f9d34
[3804012.893254]  0000000000000001 ffffffff8100e22f 000000000000f9e0 
ffff8800ac4f9fd8
[3804012.893258]  0000000000015780 0000000000015780 ffff8800018a0710 
ffff8800018a0a08
[3804012.893262] Call Trace:
[3804012.893271]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3804012.893283]  [<ffffffffa0142a67>] ? ocfs2_inode_unlock+0x144/0x1aa [ocfs2]
[3804012.893288]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[3804012.893296]  [<ffffffff810e71b1>] ? kmem_cache_alloc+0x8c/0xf0
[3804012.893300]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[3804012.893304]  [<ffffffff810f89fd>] ? do_filp_open+0x226/0x94b
[3804012.893307]  [<ffffffff8100c2c2>] ? pfn_to_mfn+0xe/0x22
[3804012.893310]  [<ffffffff8100c33c>] ? pte_pfn_to_mfn+0x21/0x30
[3804012.893313]  [<ffffffff812fcaca>] ? error_exit+0x2a/0x60
[3804012.893316]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3804012.893320]  [<ffffffff81101d0d>] ? alloc_fd+0x67/0x10c
[3804012.893323]  [<ffffffff810ed1eb>] ? do_sys_open+0x55/0xfc
[3804012.893326]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

[2086346.591237] INFO: task apache2:2218 blocked for more than 120 seconds.
[2086346.591241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[2086346.591245] apache2       D ffff8800f3f08710     0  2218    957 0x00000000
[2086346.591248]  ffff8800f3f08710 0000000000000286 ffff8800fe7b1c20 
ffff8800feb03170
[2086346.591252]  ffff8800feb03170 ffff8800feb03170 000000000000f9e0 
ffff8800fe7b1fd8
[2086346.591256]  0000000000015780 0000000000015780 ffff8800feb03170 
ffff8800feb03468
[2086346.591259] Call Trace:
[2086346.591262]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[2086346.591265]  [<ffffffff812fc51a>] ? _spin_unlock_irqrestore+0xd/0xe
[2086346.591267]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[2086346.591270]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[2086346.591272]  [<ffffffff810f64c8>] ? do_lookup+0x80/0x15d
[2086346.591275]  [<ffffffff810f6f48>] ? __link_path_walk+0x5a5/0x6f5
[2086346.591277]  [<ffffffff810f72c6>] ? path_walk+0x66/0xc9
[2086346.591280]  [<ffffffff810f8730>] ? do_path_lookup+0x20/0x77
[2086346.591283]  [<ffffffff810f9c12>] ? user_path_at+0x48/0x79
[2086346.591285]  [<ffffffff81064f1a>] ? autoremove_wake_function+0x0/0x2e
[2086346.591288]  [<ffffffff81242247>] ? move_addr_to_user+0x61/0x7c
[2086346.591291]  [<ffffffff810f205e>] ? vfs_fstatat+0x2c/0x57
[2086346.591293]  [<ffffffff810f2151>] ? sys_newstat+0x11/0x30
[2086346.591296]  [<ffffffff8106c5e5>] ? getnstimeofday+0x55/0xaf
[2086346.591298]  [<ffffffff8106c692>] ? do_gettimeofday+0x12/0x34
[2086346.591301]  [<ffffffff8105245c>] ? sys_gettimeofday+0x2b/0x61
[2086346.591303]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[2086346.591308] INFO: task apache2:2257 blocked for more than 120 seconds.
[2086346.591313] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[2086346.591317] apache2       D ffff8800f3f0dbd0     0  2257    957 0x00000000
[2086346.591320]  ffff8800f3f0dbd0 0000000000000286 ffff8800febe3c20 
ffff8800f3f0a350
[2086346.591324]  ffff8800f3f0a350 ffff8800f3f0a350 000000000000f9e0 
ffff8800febe3fd8
[2086346.591327]  0000000000015780 0000000000015780 ffff8800f3f0a350 
ffff8800f3f0a648
[2086346.591331] Call Trace:
[2086346.591334]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[2086346.591337]  [<ffffffff812fc51a>] ? _spin_unlock_irqrestore+0xd/0xe
[2086346.591339]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[2086346.591342]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[2086346.591344]  [<ffffffff810f64c8>] ? do_lookup+0x80/0x15d
[2086346.591347]  [<ffffffff810f6f48>] ? __link_path_walk+0x5a5/0x6f5
[2086346.591349]  [<ffffffff810f72c6>] ? path_walk+0x66/0xc9
[2086346.591352]  [<ffffffff810f8730>] ? do_path_lookup+0x20/0x77
[2086346.591354]  [<ffffffff810f9c12>] ? user_path_at+0x48/0x79
[2086346.591357]  [<ffffffff81064f1a>] ? autoremove_wake_function+0x0/0x2e
[2086346.591360]  [<ffffffff81242247>] ? move_addr_to_user+0x61/0x7c
[2086346.591363]  [<ffffffff810f205e>] ? vfs_fstatat+0x2c/0x57
[2086346.591365]  [<ffffffff810f2151>] ? sys_newstat+0x11/0x30
[2086346.591368]  [<ffffffff8106c5e5>] ? getnstimeofday+0x55/0xaf
[2086346.591370]  [<ffffffff8106c692>] ? do_gettimeofday+0x12/0x34
[2086346.591373]  [<ffffffff8105245c>] ? sys_gettimeofday+0x2b/0x61
[2086346.591375]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

Apache2 (2.2.19) servers are behind Ngnix reverse proxy and strace command 
reports this when an Apache2 child dies:

read(38, "GET /images/image001.g"..., 8000) = 721
gettimeofday({1313595282, 352344}, NULL) = 0
gettimeofday({1313595282, 352400}, NULL) = 0
gettimeofday({1313595282, 352448}, NULL) = 0
gettimeofday({1313595282, 352505}, NULL) = 0
gettimeofday({1313595282, 352552}, NULL) = 0
gettimeofday({1313595282, 352603}, NULL) = 0
gettimeofday({1313595282, 352653}, NULL) = 0
gettimeofday({1313595282, 352704}, NULL) = 0
gettimeofday({1313595282, 352750}, NULL) = 0
gettimeofday({1313595282, 352813}, NULL) = 0
gettimeofday({1313595282, 352863}, NULL) = 0
gettimeofday({1313595282, 352918}, NULL) = 0
gettimeofday({1313595282, 352964}, NULL) = 0
gettimeofday({1313595282, 353015}, NULL) = 0
gettimeofday({1313595282, 353062}, NULL) = 0
gettimeofday({1313595282, 353116}, NULL) = 0
gettimeofday({1313595282, 353164}, NULL) = 0
gettimeofday({1313595282, 353215}, NULL) = 0
gettimeofday({1313595282, 353278}, NULL) = 0
gettimeofday({1313595282, 353324}, NULL) = 0
gettimeofday({1313595282, 353409}, NULL) = 0
gettimeofday({1313595282, 353483}, NULL) = 0
gettimeofday({1313595282, 353538}, NULL) = 0
gettimeofday({1313595282, 353620}, NULL) = 0
gettimeofday({1313595282, 353668}, NULL) = 0
gettimeofday({1313595282, 353731}, NULL) = 0
gettimeofday({1313595282, 353782}, NULL) = 0
gettimeofday({1313595282, 353833}, NULL) = 0
gettimeofday({1313595282, 353883}, NULL) = 0
gettimeofday({1313595282, 353934}, NULL) = 0
gettimeofday({1313595282, 353984}, NULL) = 0
gettimeofday({1313595282, 354034}, NULL) = 0
gettimeofday({1313595282, 354079}, NULL) = 0
gettimeofday({1313595282, 354147}, NULL) = 0
gettimeofday({1313595282, 354219}, NULL) = 0
gettimeofday({1313595282, 354282}, NULL) = 0
gettimeofday({1313595282, 354331}, NULL) = 0
gettimeofday({1313595282, 354383}, NULL) = 0
gettimeofday({1313595282, 354435}, NULL) = 0
gettimeofday({1313595282, 354487}, NULL) = 0
gettimeofday({1313595282, 354536}, NULL) = 0
gettimeofday({1313595282, 354605}, NULL) = 0
gettimeofday({1313595282, 354654}, NULL) = 0
stat("/cluster_www/images/image001.gif", {st_mode=S_IFREG|0644, st_size=4731, 
...}) = 0
open("/.htaccess", O_RDONLY|O_CLOEXEC)  = -1 ENOENT (No such file or directory)
open("/cluster_web/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or 
directory)
open("/cluster_web/www/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such 
file or directory)
lstat("/cluster_www/images", {st_mode=S_IFDIR|0755, st_size=3896, ...}) = 0
lstat("/cluster_www/images/image001.gif", {st_mode=S_IFREG|0644, st_size=4731, 
...}) = 0
open("/cluster_www/images/image001.gif", O_RDONLY|O_CLOEXEC) = 39
fcntl(39, F_GETFD)                      = 0x1 (flags FD_CLOEXEC)
setsockopt(38, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(38, [{"HTTP/1.1 200 OK\r\nDate: Wed, 17 A"..., 284}], 1) = 284
sendfile(38, 39, [0], 4731)             = 4731
setsockopt(38, SOL_TCP, TCP_CORK, [0], 4) = 0
gettimeofday({1313595282, 355611}, NULL) = 0
write(27, "192.168.199.13 - - [17/Aug/2011"..., 211) = 211
gettimeofday({1313595282, 355742}, NULL) = 0
times({tms_utime=0, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 1889853582
gettimeofday({1313595282, 355871}, NULL) = 0
shutdown(38, 1 /* send */)              = 0
poll([{fd=38, events=POLLIN}], 1, 2000) = 1 ([{fd=38, revents=POLLIN|POLLHUP}])
read(38, "", 512)                       = 0
close(38)                               = 0
read(5, 0x7fff9c0a3dcf, 1)              = -1 EAGAIN (Resource temporarily 
unavailable)
close(39)                               = 0
gettimeofday({1313595282, 356503}, NULL) = 0
accept(4, {sa_family=AF_INET6, sin6_port=htons(54007), inet_pton(AF_INET6, 
"::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 38
fcntl(38, F_GETFD)                      = 0
fcntl(38, F_SETFD, FD_CLOEXEC)          = 0
gettimeofday({1313595287, 894718}, NULL) = 0
getsockname(38, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, 
"::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl(38, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(38, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
gettimeofday({1313595287, 894970}, NULL) = 0
gettimeofday({1313595287, 895026}, NULL) = 0
gettimeofday({1313595287, 895075}, NULL) = 0
read(38, "OPTIONS * HTTP/1.0\r\nUser-Agent: "..., 8000) = 86
gettimeofday({1313595287, 895194}, NULL) = 0
gettimeofday({1313595287, 895251}, NULL) = 0
gettimeofday({1313595287, 895301}, NULL) = 0
gettimeofday({1313595287, 895355}, NULL) = 0
gettimeofday({1313595287, 895406}, NULL) = 0
gettimeofday({1313595287, 895465}, NULL) = 0
gettimeofday({1313595287, 895518}, NULL) = 0
writev(38, [{"HTTP/1.1 200 OK\r\nDate: Wed, 17 A"..., 152}], 1) = 152
gettimeofday({1313595287, 895670}, NULL) = 0
write(36, "::1 - - [17/Aug/2011:17:34:47 +0"..., 123) = 123
gettimeofday({1313595287, 895797}, NULL) = 0
times({tms_utime=0, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 1889854136
gettimeofday({1313595287, 895995}, NULL) = 0
shutdown(38, 1 /* send */)              = -1 ENOTCONN (Transport endpoint is 
not connected)
close(38)                               = 0
read(5, "!", 1)                         = 1
close(37)                               = 0
munmap(0x7fc77fa49000, 2126320)         = 0
munmap(0x7fc77fc51000, 2200840)         = 0
gettimeofday({1313595287, 897317}, NULL) = 0
munmap(0x7fc77fe6b000, 2236240)         = 0
munmap(0x7fc7804ad000, 2150792)         = 0
munmap(0x7fc78008d000, 4323776)         = 0
munmap(0x7fc781f3d000, 2163144)         = 0
munmap(0x7fc781cd8000, 2508584)         = 0
munmap(0x7fc781aa6000, 2301656)         = 0
munmap(0x7fc781881000, 2248208)         = 0
munmap(0x7fc781424000, 2417800)         = 0
munmap(0x7fc781673000, 2152936)         = 0
munmap(0x7fc78120c000, 2193632)         = 0
munmap(0x7fc780f61000, 2796456)         = 0
munmap(0x7fc780ce9000, 2587424)         = 0
munmap(0x7fc780ace000, 2204640)         = 0
munmap(0x7fc7808be000, 2161864)         = 0
munmap(0x7fc7806bb000, 2109040)         = 0
munmap(0x7fc77f81e000, 2272040)         = 0
munmap(0x7fc782add000, 323584)          = 0
munmap(0x7fc786b6d000, 8192)            = 0
munmap(0x7fc78445e000, 8192)            = 0
munmap(0x7fc784460000, 8192)            = 0
munmap(0x7fc784458000, 8192)            = 0
munmap(0x7fc78445a000, 8192)            = 0
munmap(0x7fc784454000, 8192)            = 0
munmap(0x7fc78445c000, 8192)            = 0
munmap(0x7fc784456000, 8192)            = 0
close(6)                                = 0
close(5)                                = 0
exit_group(0)                           = ?


Thanks!

Roberto


-- System Information:
Debian Release: 6.0.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'testing'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to