[DRBD-user] jbd2/drbd0 blocked for more than 120 seconds

Dario Fiumicello - Antek Wed, 09 Feb 2011 04:48:04 -0800

Hi all, I have two Virtualbox VM running on two different physicalhosts. The vm are interconnected with two gigabit ethernet for drbd syncand heartbeat.


Suddenly I get this on master machine:

Feb 9 10:53:24 mail1 kernel: [136200.650336] INFO: taskjbd2/drbd0-8:13739 blocked for more than 120 seconds.Feb 9 10:53:24 mail1 kernel: [136200.650967] "echo 0 >/proc/sys/kernel/hung_task_timeout_secs" disables this message.Feb 9 10:53:24 mail1 kernel: [136200.651651] jbd2/drbd0-8 D0000000000000002 0 13739 2 0x00000000Feb 9 10:53:24 mail1 kernel: [136200.651660] ffff880030365b300000000000000046 0000000000015bc0 0000000000015bc0Feb 9 10:53:24 mail1 kernel: [136200.651668] ffff88003cddb198ffff880030365fd8 0000000000015bc0 ffff88003cddade0Feb 9 10:53:24 mail1 kernel: [136200.651676] 0000000000015bc0ffff880030365fd8 0000000000015bc0 ffff88003cddb198

Feb  9 10:53:24 mail1 kernel: [136200.651684] Call Trace:

Feb 9 10:53:24 mail1 kernel: [136200.651725] [<ffffffff810f3cd0>] ?sync_page+0x0/0x50Feb 9 10:53:24 mail1 kernel: [136200.651743] [<ffffffff81559633>]io_schedule+0x73/0xc0Feb 9 10:53:24 mail1 kernel: [136200.651751] [<ffffffff810f3d0d>]sync_page+0x3d/0x50Feb 9 10:53:24 mail1 kernel: [136200.651759] [<ffffffff81559c7f>]__wait_on_bit+0x5f/0x90Feb 9 10:53:24 mail1 kernel: [136200.651766] [<ffffffff810f3ec3>]wait_on_page_bit+0x73/0x80Feb 9 10:53:24 mail1 kernel: [136200.651775] [<ffffffff81084440>] ?wake_bit_function+0x0/0x40Feb 9 10:53:24 mail1 kernel: [136200.651790] [<ffffffff810fe305>] ?pagevec_lookup_tag+0x25/0x40Feb 9 10:53:24 mail1 kernel: [136200.651798] [<ffffffff810f4355>]wait_on_page_writeback_range+0xf5/0x190Feb 9 10:53:24 mail1 kernel: [136200.651805] [<ffffffff810f441f>]filemap_fdatawait+0x2f/0x40Feb 9 10:53:24 mail1 kernel: [136200.651814] [<ffffffff8121c6d4>]jbd2_journal_commit_transaction+0x744/0x1280Feb 9 10:53:24 mail1 kernel: [136200.651822] [<ffffffff81076a59>] ?try_to_del_timer_sync+0x79/0xd0Feb 9 10:53:24 mail1 kernel: [136200.651831] [<ffffffff8122378d>]kjournald2+0xbd/0x220Feb 9 10:53:24 mail1 kernel: [136200.651838] [<ffffffff81084400>] ?autoremove_wake_function+0x0/0x40Feb 9 10:53:24 mail1 kernel: [136200.651846] [<ffffffff812236d0>] ?kjournald2+0x0/0x220Feb 9 10:53:24 mail1 kernel: [136200.651853] [<ffffffff81084086>]kthread+0x96/0xa0Feb 9 10:53:24 mail1 kernel: [136200.651861] [<ffffffff810131ea>]child_rip+0xa/0x20Feb 9 10:53:24 mail1 kernel: [136200.651869] [<ffffffff81083ff0>] ?kthread+0x0/0xa0Feb 9 10:53:24 mail1 kernel: [136200.651876] [<ffffffff810131e0>] ?child_rip+0x0/0x20

And from this moment many other errors of blocked tasks appears(postfix, pickup and so on). The machine load was more than 25!

Obviously I cannot use the machine anymore and I needed to kill it inorder to force the takeover on the slave. Halt didn't work either.


My question is: why did I get this error? What can I do to avoid it?

Thanks

--
Dario Fiumicello - Antek S.r.l.
+3902890380 73


_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] jbd2/drbd0 blocked for more than 120 seconds

Reply via email to