Hi Mark, thanks for your response. I changed the dirty page settings again and i have a logging server which gets the syslog and swift logs.
Nov 10 08:28:17 storage3 kernel: [92476.204616] INFO: task xfsaild/sdd:1108 blocked for more than 120 seconds. Nov 10 08:28:17 storage3 kernel: [92476.204635] Not tainted 3.19.0-30-generic #34~14.04.1-Ubuntu Nov 10 08:28:17 storage3 kernel: [92476.204655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 08:28:17 storage3 kernel: [92476.204680] xfsaild/sdd D ffff880813ae3d28 0 1108 2 0x00000000 Nov 10 08:28:17 storage3 kernel: [92476.204688] ffff880813ae3d28 ffff880811a31d70 0000000000013e80 ffff880813ae3fd8 Nov 10 08:28:17 storage3 kernel: [92476.204695] 0000000000013e80 ffff880814939d70 ffff880811a31d70 0000000000000286 Nov 10 08:28:17 storage3 kernel: [92476.204701] ffff88040ebb4128 ffff880811a31d70 0000000000000000 ffff88040ebb4000 Nov 10 08:28:17 storage3 kernel: [92476.204707] Call Trace: Nov 10 08:28:17 storage3 kernel: [92476.204721] [<ffffffff817b2a99>] schedule+0x29/0x70 Nov 10 08:28:17 storage3 kernel: [92476.204792] [<ffffffffc0373a21>] _xfs_log_force+0x171/0x270 [xfs] Nov 10 08:28:17 storage3 kernel: [92476.204801] [<ffffffff810a0a90>] ? wake_up_state+0x20/0x20 Nov 10 08:28:17 storage3 kernel: [92476.204807] [<ffffffff810dab60>] ? internal_add_timer+0x80/0x80 Nov 10 08:28:17 storage3 kernel: [92476.204851] [<ffffffffc0373b4a>] xfs_log_force+0x2a/0x90 [xfs] Nov 10 08:28:17 storage3 kernel: [92476.204895] [<ffffffffc037e2d0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Nov 10 08:28:17 storage3 kernel: [92476.204939] [<ffffffffc037e410>] xfsaild+0x140/0x5a0 [xfs] Nov 10 08:28:17 storage3 kernel: [92476.204983] [<ffffffffc037e2d0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Nov 10 08:28:17 storage3 kernel: [92476.204991] [<ffffffff81093822>] kthread+0xd2/0xf0 Nov 10 08:28:17 storage3 kernel: [92476.204997] [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0 Nov 10 08:28:17 storage3 kernel: [92476.205004] [<ffffffff817b6d98>] ret_from_fork+0x58/0x90 Nov 10 08:28:17 storage3 kernel: [92476.205009] [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0 But this messages will not help to find the problem. They are stale processes and it seems the start partition is the SSD Raid1 which caused in a cascade overall SATA HDD's. Thanks and cheers Heiko Am 09.11.2015 um 21:58 schrieb Mark Kirkwood: > On 10/11/15 00:41, Eren Türkay wrote: >> On 09-11-2015 12:39, Heiko Krämer wrote: >>> You're right only a hard reboot can solve the problem because SSH >>> login or other commands can't be executed because the whole system was >>> frozen. >> >> Hello Heiko, >> >> I just want to give some tips about debugging those kind of issues. I >> had >> completely different problem which hanged the machine and I needed to >> debug it. >> Since I couldn't access the logs, I setup kernel debugging to remote >> server. It >> is called netconsole. You may want to setup netconsole listener on >> one of the >> working servers outside of your swift, and setup netconsole on the >> servers that >> hang. Here is the information about how to setup: >> >> https://www.kernel.org/doc/Documentation/networking/netconsole.txt >> >> At least you may be able to see the kernel log just before the >> machines hang. I >> hope you find it useful. >> >> > > Also might be worth looking at using the remote logging capability of > rsyslog to collect your swift logs on another server, so you can see > what is/was happening on the swift side of things immediately before > any hang. > > regards > > Mark > > _______________________________________________ > Mailing list: > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -- anynines.com
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack