here is the issue in detail.
DB01
DB02
Ap01
AP02
when i crash DB01 server, DB02 server also
goes down and total oracle is collapsed.
when doing vice versa, crash DB02, DB01
survives . and oracle continues to work without any issues
messages_DB02.txt
===================
Jan 20 13:15:52 kbmmoppdb02
avahi-daemon[8824]: Registering new address record for
172.20.1.9 on eth0.
Jan 20 13:16:13 kbmmoppdb02 kernel: o2dlm:
Node 0 leaves domain 8155F09482C94D3AB99D0669B91C0B1E
Jan 20 13:16:13 kbmmoppdb02 kernel: o2dlm:
Nodes in domain 8155F09482C94D3AB99D0669B91C0B1E: 1
Jan 20 13:17:27 kbmmoppdb02 kernel: o2net:
connection to node kbmmoppdb01 (num 0) at 10.255.255.3:7777
has been idle for 30.0 seconds, shutting it down.
Jan 20 13:17:27 kbmmoppdb02 kernel:
(swapper,0,11):o2net_idle_timer:1515 here are some times
that might help debug the situation: (tmr 1390245417.409760
now 1390245447.410787 dr 1390245417.409740 adv
1390245417.409769:1390245417.409770 func (d9d367e5:505)
1390245414.653885:1390245414.653892)
Jan 20 13:17:27 kbmmoppdb02 kernel: o2net: no
longer connected to node kbmmoppdb01 (num 0) at
10.255.255.3:7777
Jan 20 13:17:27 kbmmoppdb02 kernel:
(kswapd0,576,10):dlm_send_remote_unlock_request:360 ERROR:
Error -112 when sending message 506 (key 0x60f827ee) to node
0
Jan 20 13:17:48 kbmmoppdb02 kernel: o2net:
connection to node kbmmoppdb01 (num 0) at 10.255.255.3:7777
shutdown, state 7
Jan 20 13:17:57 kbmmoppdb02 kernel:
(o2net,6123,11):o2net_connect_expired:1676 ERROR: no
connection established with node 0 after 30.0 seconds,
giving up and returning errors.
Jan 20 13:17:57 kbmmoppdb02 kernel:
(dlm_thread,6161,8):dlm_drop_lockres_ref:2191 ERROR: Error
-107 when sending message 507 (key 0x60f827ee) to node 0
Jan 20 13:17:57 kbmmoppdb02 kernel:
(kswapd0,576,10):dlm_send_remote_unlock_request:360 ERROR:
Error -107 when sending message 506 (key 0x60f827ee) to node
0
Jan 20 13:17:57 kbmmoppdb02 last message
repeated 73 times
Jan 20 13:17:57 kbmmoppdb02 kernel:
(dlm_thread,6161,8):dlm_purge_lockres:193 ERROR:
C5F98815D0BF43578B48C12C21114311: deref
O000000000000000124facd00000000 failed -107
Jan 20 13:17:57 kbmmoppdb02 kernel: o2net:
connection to node kbmmoppdb01 (num 0) at 10.255.255.3:7777
shutdown, state 7
Jan 20 13:18:25 kbmmoppdb02 last message
repeated 9 times
Jan 20 13:18:27 kbmmoppdb02 kernel:
(o2net,6123,11):o2net_connect_expired:1676 ERROR: no
connection established with node 0 after 30.0 seconds,
giving up and returning errors.
Jan 20 13:18:27 kbmmoppdb02 kernel:
(dlm_thread,6161,8):dlm_drop_lockres_ref:2191 ERROR: Error
-107 when sending message 507 (key 0x60f827ee) to node 0
Jan 20 13:18:27 kbmmoppdb02 kernel:
(kswapd0,576,10):dlm_send_remote_unlock_request:360 ERROR:
Error -107 when sending message 506 (key 0x60f827ee) to node
0
Jan 20 13:18:27 kbmmoppdb02 last message
repeated 180 times
Jan 20 13:18:27 kbmmoppdb02 kernel:
(dlm_thread,6161,8):dlm_purge_lockres:193 ERROR:
C5F98815D0BF43578B48C12C21114311: deref
M000000000000000124facd00000000 failed -107
Jan 20 13:18:27 kbmmoppdb02 kernel:
(dlm_thread,6161,10):dlm_drop_lockres_ref:2191 ERROR: Error
-107 when sending message 507 (key 0x60f827ee) to node 0
Jan 20 13:18:27 kbmmoppdb02 kernel:
(dlm_thread,6161,10):dlm_purge_lockres:193 ERROR:
C5F98815D0BF43578B48C12C21114311: deref
O000000000000000124facc00000000 failed -107
Jan 20 13:18:27 kbmmoppdb02 kernel:
(dlm_thread,6161,4):dlm_drop_lockres_ref:2191 ERROR: Error
-107 when sending message 507 (key 0x60f827ee) to node 0
Jan 20 13:18:27 kbmmoppdb02 kernel:
(dlm_thread,6161,4):dlm_purge_lockres:193 ERROR:
C5F98815D0BF43578B48C12C21114311: deref
O000000000000000124fa8e00000000 failed -107
Jan 20 13:18:28 kbmmoppdb02 kernel: o2net:
connection to node kbmmoppdb01 (num 0) at 10.255.255.3:7777
shutdown, state 7
Jan 20 13:18:31 kbmmoppdb02 kernel: o2net:
connection to node kbmmoppdb01 (num 0) at 10.255.255.3:7777
shutdown, state 7
Jan 20 13:18:33 kbmmoppdb02 kernel:
(events/11,49,11):o2quo_make_decision:158 ERROR: fencing
this node because it is connected to a half-quorum of 1 out
of 2 nodes which doesn't include the lowest active node 0
Jan 20 13:18:33 kbmmoppdb02 kernel:
(events/11,49,11):o2hb_stop_all_regions:2026 ERROR: stopping
heartbeat on all active regions.
Jan 20 13:23:10 kbmmoppdb02 syslogd 1.4.1:
restart.
Jan 20 13:23:10 kbmmoppdb02 kernel: klogd
1.4.1, log source = /proc/kmsg started
Regards,
Thiruselvam V

VOIP : 603 521 6544 | Mobile :+91
9986150593│Fax:+91 80 41122605 | Skype :
vthirusel...@gmail.com
AOL :
tvelayut...@kbace.com
KBACE Technologies
www.kbace.com
Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such person), you
may not copy or deliver this message to anyone. In such case,
you should destroy this message, and notify the sender
immediately. If you or your employer does not consent to e-mail
messages of this kind, please advise the sender immediately.
Opinions, conclusions and other information expressed in this
message are not given or endorsed by KBACE unless otherwise
indicated by an authorized representative independent of this
message.