is ocfs2 heartbeat transferred over the network, or just updating a file to the 
shared disk?
 
If the heartbeat lost, what should happen? what if only one node is writing, 
and the other is still? Will it still cause any file system issue?


Thanks.
 
Hai Tao
 




From: taoh...@hotmail.com
To: ocfs2-users@oss.oracle.com
Date: Sat, 10 Sep 2011 00:50:23 -0700
Subject: [Ocfs2-users] disable heartbeat nic caused ocfs2 errors





I have a two nodes ocfs2 cluster, and I disabled the heartbeat nic with "ifdown 
eth1". I got following weird logs on both nodes:
 
Sep  7 10:45:49 dbtest-01 kernel: o2net: connection to node dbtest-02 (num 1) 
at 10.194.59.65:7777 has been idle for 30.0 seconds, shutting it down.
Sep  7 10:45:49 dbtest-01 kernel: (swapper,0,3):o2net_idle_timer:1503 here are 
some times that might help debug the situation: (tmr 1315417519.185025 now 
1315417549.183798 dr 1315417519.185016 adv 1315417519.185032:1315417519.185032 
func (b9bb7168:504) 1315417518.872227:1315417518.872268)
Sep  7 10:45:49 dbtest-01 kernel: o2net: no longer connected to node dbtest-02 
(num 1) at 10.194.59.65:7777
Sep  7 10:45:49 dbtest-01 kernel: 
(dlm_thread,3781,2):dlm_send_proxy_ast_msg:457 ERROR: status = -112
Sep  7 10:45:49 dbtest-01 kernel: (oracle,26129,1):dlm_do_master_request:1334 
ERROR: link to 1 went down!
Sep  7 10:45:49 dbtest-01 kernel: (oracle,26129,1):dlm_get_lock_resource:917 
ERROR: status = -112
Sep  7 10:45:49 dbtest-01 kernel: 
(dlm_thread,4256,1):dlm_send_proxy_ast_msg:457 ERROR: status = -112
Sep  7 10:45:49 dbtest-01 kernel: (dlm_thread,4256,1):dlm_flush_asts:604 ERROR: 
status = -112
Sep  7 10:45:49 dbtest-01 kernel: (dlm_thread,3781,2):dlm_flush_asts:604 ERROR: 
status = -112
Sep  7 10:46:19 dbtest-01 kernel: (o2net,3736,3):o2net_connect_expired:1664 
ERROR: no connection established with node 1 after 30.0 seconds, giving up and 
returning errors.
Sep  7 10:46:19 dbtest-01 kernel: o2net: accepted connection from node 
dbtest-02 (num 1) at 10.194.59.65:7777
Sep  7 10:48:37 dbtest-01 kernel: INFO: task events/0:10 blocked for more than 
120 seconds.
Sep  7 10:48:37 dbtest-01 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  7 10:48:37 dbtest-01 kernel: events/0      D ffff810001004420     0    10  
    1            11     9 (L-TLB)
Sep  7 10:48:37 dbtest-01 kernel:  ffff81083ffedc80 0000000000000046 
ffffffff80333680 0000000000000001
Sep  7 10:48:37 dbtest-01 kernel:  0000000000000400 000000000000000a 
ffff81083ffe1820 ffffffff80309b60
Sep  7 10:48:37 dbtest-01 kernel:  0030b62498ce7b3f 000000000000416b 
ffff81083ffe1a08 0000000000000000
Sep  7 10:48:37 dbtest-01 kernel: Call Trace:
Sep  7 10:48:37 dbtest-01 kernel: Call Trace:
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff80064167>] 
wait_for_completion+0x79/0xa2
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff8008e16d>] 
default_wake_function+0x0/0xe
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff884e64b7>] 
:ocfs2:ocfs2_wait_for_mask+0xd/0x19
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff884e78d8>] 
:ocfs2:ocfs2_cluster_lock+0x9ae/0x9d3
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff885013e5>] 
:ocfs2:ocfs2_orphan_scan_work+0x0/0x83
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff884ed1e4>] 
:ocfs2:ocfs2_orphan_scan_lock+0x55/0x84
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff884fc59b>] 
:ocfs2:ocfs2_queue_orphan_scan+0x32/0x147
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff885013ff>] 
:ocfs2:ocfs2_orphan_scan_work+0x1a/0x83
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff8004dc37>] run_workqueue+0x94/0xe4
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff8004a472>] worker_thread+0x0/0x122
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff8004a562>] worker_thread+0xf0/0x122
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff8008e16d>] 
default_wake_function+0x0/0xe
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff80032bdc>] kthread+0xfe/0x132
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff8005efb1>] child_rip+0xa/0x11
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff80032ade>] kthread+0x0/0x132
Sep  7 10:48:37 dbtest-01 kernel:  [<ffffffff8005efa7>] child_rip+0x0/0x11
Sep  7 10:48:37 dbtest-01 kernel:

Does anyone know why this happened?
 
Thanks.

_______________________________________________ Ocfs2-users mailing list 
Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users   
                                 
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to