Dear Experts,

Our DBA team is facing following problem.


We did high availability testing and when we crash DB node 1, DB Node 2 also 
went down, and from the errors, i could see ocfs2 service has shutdown DB02

here is the issue in detail.

DB01
DB02
Ap01
AP02

when i crash DB01 server, DB02 server also goes down and total oracle is 
collapsed.

when doing vice versa, crash DB02, DB01 survives . and oracle continues to work 
without any issues



messages_DB02.txt
===================

Jan 20 13:15:52 kbmmoppdb02 avahi-daemon[8824]: Registering new address record 
for 172.20.1.9 on eth0.
Jan 20 13:16:13 kbmmoppdb02 kernel: o2dlm: Node 0 leaves domain 
8155F09482C94D3AB99D0669B91C0B1E
Jan 20 13:16:13 kbmmoppdb02 kernel: o2dlm: Nodes in domain 
8155F09482C94D3AB99D0669B91C0B1E: 1
Jan 20 13:17:27 kbmmoppdb02 kernel: o2net: connection to node kbmmoppdb01 (num 
0) at 10.255.255.3:7777 has been idle for 30.0 seconds, shutting it down.
Jan 20 13:17:27 kbmmoppdb02 kernel: (swapper,0,11):o2net_idle_timer:1515 here 
are some times that might help debug the situation: (tmr 1390245417.409760 now 
1390245447.410787 dr 1390245417.409740 adv 1390245417.409769:1390245417.409770 
func (d9d367e5:505) 1390245414.653885:1390245414.653892)
Jan 20 13:17:27 kbmmoppdb02 kernel: o2net: no longer connected to node 
kbmmoppdb01 (num 0) at 10.255.255.3:7777
Jan 20 13:17:27 kbmmoppdb02 kernel: 
(kswapd0,576,10):dlm_send_remote_unlock_request:360 ERROR: Error -112 when 
sending message 506 (key 0x60f827ee) to node 0
Jan 20 13:17:48 kbmmoppdb02 kernel: o2net: connection to node kbmmoppdb01 (num 
0) at 10.255.255.3:7777 shutdown, state 7
Jan 20 13:17:57 kbmmoppdb02 kernel: (o2net,6123,11):o2net_connect_expired:1676 
ERROR: no connection established with node 0 after 30.0 seconds, giving up and 
returning errors.
Jan 20 13:17:57 kbmmoppdb02 kernel: 
(dlm_thread,6161,8):dlm_drop_lockres_ref:2191 ERROR: Error -107 when sending 
message 507 (key 0x60f827ee) to node 0
Jan 20 13:17:57 kbmmoppdb02 kernel: 
(kswapd0,576,10):dlm_send_remote_unlock_request:360 ERROR: Error -107 when 
sending message 506 (key 0x60f827ee) to node 0
Jan 20 13:17:57 kbmmoppdb02 last message repeated 73 times
Jan 20 13:17:57 kbmmoppdb02 kernel: (dlm_thread,6161,8):dlm_purge_lockres:193 
ERROR: C5F98815D0BF43578B48C12C21114311: deref O000000000000000124facd00000000 
failed -107
Jan 20 13:17:57 kbmmoppdb02 kernel: o2net: connection to node kbmmoppdb01 (num 
0) at 10.255.255.3:7777 shutdown, state 7
Jan 20 13:18:25 kbmmoppdb02 last message repeated 9 times
Jan 20 13:18:27 kbmmoppdb02 kernel: (o2net,6123,11):o2net_connect_expired:1676 
ERROR: no connection established with node 0 after 30.0 seconds, giving up and 
returning errors.
Jan 20 13:18:27 kbmmoppdb02 kernel: 
(dlm_thread,6161,8):dlm_drop_lockres_ref:2191 ERROR: Error -107 when sending 
message 507 (key 0x60f827ee) to node 0
Jan 20 13:18:27 kbmmoppdb02 kernel: 
(kswapd0,576,10):dlm_send_remote_unlock_request:360 ERROR: Error -107 when 
sending message 506 (key 0x60f827ee) to node 0
Jan 20 13:18:27 kbmmoppdb02 last message repeated 180 times
Jan 20 13:18:27 kbmmoppdb02 kernel: (dlm_thread,6161,8):dlm_purge_lockres:193 
ERROR: C5F98815D0BF43578B48C12C21114311: deref M000000000000000124facd00000000 
failed -107
Jan 20 13:18:27 kbmmoppdb02 kernel: 
(dlm_thread,6161,10):dlm_drop_lockres_ref:2191 ERROR: Error -107 when sending 
message 507 (key 0x60f827ee) to node 0
Jan 20 13:18:27 kbmmoppdb02 kernel: (dlm_thread,6161,10):dlm_purge_lockres:193 
ERROR: C5F98815D0BF43578B48C12C21114311: deref O000000000000000124facc00000000 
failed -107


Jan 20 13:18:27 kbmmoppdb02 kernel: 
(dlm_thread,6161,4):dlm_drop_lockres_ref:2191 ERROR: Error -107 when sending 
message 507 (key 0x60f827ee) to node 0
Jan 20 13:18:27 kbmmoppdb02 kernel: (dlm_thread,6161,4):dlm_purge_lockres:193 
ERROR: C5F98815D0BF43578B48C12C21114311: deref O000000000000000124fa8e00000000 
failed -107
Jan 20 13:18:28 kbmmoppdb02 kernel: o2net: connection to node kbmmoppdb01 (num 
0) at 10.255.255.3:7777 shutdown, state 7
Jan 20 13:18:31 kbmmoppdb02 kernel: o2net: connection to node kbmmoppdb01 (num 
0) at 10.255.255.3:7777 shutdown, state 7
Jan 20 13:18:33 kbmmoppdb02 kernel: (events/11,49,11):o2quo_make_decision:158 
ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 
nodes which doesn't include the lowest active node 0
Jan 20 13:18:33 kbmmoppdb02 kernel: 
(events/11,49,11):o2hb_stop_all_regions:2026 ERROR: stopping heartbeat on all 
active regions.
Jan 20 13:23:10 kbmmoppdb02 syslogd 1.4.1: restart.
Jan 20 13:23:10 kbmmoppdb02 kernel: klogd 1.4.1, log source = /proc/kmsg started

Regards,
Thiruselvam V

[Description: Description: Description: cid:image001.png@01CE5876.CAF7DD70]
VOIP : 603 521 6544 | Mobile :+91 9986150593│Fax:+91 80 41122605 | Skype : 
vthirusel...@gmail.com<mailto:vthirusel...@gmail.com>  AOL : 
tvelayut...@kbace.com<mailto:tvelayut...@kbace.com>
KBACE Technologies
www.kbace.com<http://www.kbace.com/>

Privileged/Confidential Information may be contained in this message. If you 
are not the addressee indicated in this message (or responsible for delivery of 
the message to such person), you may not copy or deliver this message to 
anyone. In such case, you should destroy this message, and notify the sender 
immediately. If you or your employer does not consent to e-mail messages of 
this kind, please advise the sender immediately. Opinions, conclusions and 
other information expressed in this message are not given or endorsed by KBACE 
unless otherwise indicated by an authorized representative independent of this 
message.

<<inline: image001.png>>

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to