is ocfs2 heartbeat transferred over the network, or just updating a file to the shared disk? If the heartbeat lost, what should happen? what if only one node is writing, and the other is still? Will it still cause any file system issue?
Thanks. Hai Tao From: taoh...@hotmail.com To: ocfs2-users@oss.oracle.com Date: Sat, 10 Sep 2011 00:50:23 -0700 Subject: [Ocfs2-users] disable heartbeat nic caused ocfs2 errors I have a two nodes ocfs2 cluster, and I disabled the heartbeat nic with "ifdown eth1". I got following weird logs on both nodes: Sep 7 10:45:49 dbtest-01 kernel: o2net: connection to node dbtest-02 (num 1) at 10.194.59.65:7777 has been idle for 30.0 seconds, shutting it down. Sep 7 10:45:49 dbtest-01 kernel: (swapper,0,3):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1315417519.185025 now 1315417549.183798 dr 1315417519.185016 adv 1315417519.185032:1315417519.185032 func (b9bb7168:504) 1315417518.872227:1315417518.872268) Sep 7 10:45:49 dbtest-01 kernel: o2net: no longer connected to node dbtest-02 (num 1) at 10.194.59.65:7777 Sep 7 10:45:49 dbtest-01 kernel: (dlm_thread,3781,2):dlm_send_proxy_ast_msg:457 ERROR: status = -112 Sep 7 10:45:49 dbtest-01 kernel: (oracle,26129,1):dlm_do_master_request:1334 ERROR: link to 1 went down! Sep 7 10:45:49 dbtest-01 kernel: (oracle,26129,1):dlm_get_lock_resource:917 ERROR: status = -112 Sep 7 10:45:49 dbtest-01 kernel: (dlm_thread,4256,1):dlm_send_proxy_ast_msg:457 ERROR: status = -112 Sep 7 10:45:49 dbtest-01 kernel: (dlm_thread,4256,1):dlm_flush_asts:604 ERROR: status = -112 Sep 7 10:45:49 dbtest-01 kernel: (dlm_thread,3781,2):dlm_flush_asts:604 ERROR: status = -112 Sep 7 10:46:19 dbtest-01 kernel: (o2net,3736,3):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 30.0 seconds, giving up and returning errors. Sep 7 10:46:19 dbtest-01 kernel: o2net: accepted connection from node dbtest-02 (num 1) at 10.194.59.65:7777 Sep 7 10:48:37 dbtest-01 kernel: INFO: task events/0:10 blocked for more than 120 seconds. Sep 7 10:48:37 dbtest-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 7 10:48:37 dbtest-01 kernel: events/0 D ffff810001004420 0 10 1 11 9 (L-TLB) Sep 7 10:48:37 dbtest-01 kernel: ffff81083ffedc80 0000000000000046 ffffffff80333680 0000000000000001 Sep 7 10:48:37 dbtest-01 kernel: 0000000000000400 000000000000000a ffff81083ffe1820 ffffffff80309b60 Sep 7 10:48:37 dbtest-01 kernel: 0030b62498ce7b3f 000000000000416b ffff81083ffe1a08 0000000000000000 Sep 7 10:48:37 dbtest-01 kernel: Call Trace: Sep 7 10:48:37 dbtest-01 kernel: Call Trace: Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff80064167>] wait_for_completion+0x79/0xa2 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff8008e16d>] default_wake_function+0x0/0xe Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff884e64b7>] :ocfs2:ocfs2_wait_for_mask+0xd/0x19 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff884e78d8>] :ocfs2:ocfs2_cluster_lock+0x9ae/0x9d3 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff885013e5>] :ocfs2:ocfs2_orphan_scan_work+0x0/0x83 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff884ed1e4>] :ocfs2:ocfs2_orphan_scan_lock+0x55/0x84 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff884fc59b>] :ocfs2:ocfs2_queue_orphan_scan+0x32/0x147 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff885013ff>] :ocfs2:ocfs2_orphan_scan_work+0x1a/0x83 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff8004dc37>] run_workqueue+0x94/0xe4 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff8004a472>] worker_thread+0x0/0x122 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff8004a562>] worker_thread+0xf0/0x122 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff8008e16d>] default_wake_function+0x0/0xe Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff80032bdc>] kthread+0xfe/0x132 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff8005efb1>] child_rip+0xa/0x11 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff80032ade>] kthread+0x0/0x132 Sep 7 10:48:37 dbtest-01 kernel: [<ffffffff8005efa7>] child_rip+0x0/0x11 Sep 7 10:48:37 dbtest-01 kernel: Does anyone know why this happened? Thanks. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users