On Mon, Feb 11, 2013 at 9:24 PM, Viacheslav Biriukov <v.v.biriu...@gmail.com> wrote: > It is VM in the OpenStack. So we can't use static IP. > Right now investigating why interface become down.
Even if you solve that, dynamic IP addresses are fundamentally incompatible with cluster software. You're effectively trying to create a cluster out of nodes which change their name every time they boot. > > Thank you! > > > 2013/2/11 Viacheslav Biriukov <v.v.biriu...@gmail.com> >> >> >> >> >> 2013/2/11 Dan Frincu <df.clus...@gmail.com> >>> >>> Hi, >>> >>> On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov >>> <v.v.biriu...@gmail.com> wrote: >>> > Hi guys, >>> > >>> > Got a tricky issue with Corosync and Pacemaker over DHCP IP address >>> > using >>> > unicast. Corosync craches periodically. >>> > >>> > Packages are from centos 6 repos: >>> > corosync-1.4.1-7.el6_3.1.x86_64 >>> > corosynclib-1.4.1-7.el6_3.1.x86_64 >>> > pacemaker-cluster-libs-1.1.7-6.el6.x86_64 >>> > pacemaker-libs-1.1.7-6.el6.x86_64 >>> > pacemaker-cli-1.1.7-6.el6.x86_64 >>> > pacemaker-1.1.7-6.el6.x86_64 >>> > >>> > >>> > Logs >>> > >>> > Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor >>> > Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new >>> > configuration. >>> > Feb 10 07:56:22 corosync [TOTEM ] The network interface is down. >>> >>> This ^^^ is your problem. Corosync doesn't like it, see >>> >>> https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface >>> >>> Normally DHCP shouldn't take the interface down. Also, since changing >>> the network configuration in corosync means restarting it, why not go >>> with static IP's? >>> >>> HTH, >>> Dan >>> >>> > Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104] >>> > is >>> > now up. >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: >>> > cfg_connection_destroy: >>> > Connection destroyed >>> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: >>> > Receiving message body failed: (2) Library error: Resource temporarily >>> > unavailable (11) >>> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: >>> > Receiving message body failed: (2) Library error: Resource temporarily >>> > unavailable (11) >>> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: >>> > Receiving message body failed: (2) Library error: Resource temporarily >>> > unavailable (11) >>> > Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: >>> > AIS >>> > connection failed >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: >>> > cpg_connection_destroy: >>> > Connection destroyed >>> > Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: >>> > AIS >>> > connection failed >>> > Feb 10 07:56:25 [5251] host1 crmd: info: crmd_ais_destroy: >>> > connection closed >>> > Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: >>> > AIS >>> > connection failed >>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: >>> > Receiving message body failed: (2) Library error: Resource temporarily >>> > unavailable (11) >>> > Feb 10 07:56:25 [5246] host1 cib: error: cib_ais_destroy: >>> > AIS >>> > connection terminated >>> > Feb 10 07:56:25 [5249] host1 attrd: crit: attrd_ais_destroy: >>> > Lost >>> > connection to OpenAIS service! >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: >>> > pcmk_shutdown_worker: >>> > Shuting down Pacemaker >>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: >>> > AIS >>> > connection failed >>> > Feb 10 07:56:25 [5249] host1 attrd: notice: main: >>> > Exiting... >>> > Feb 10 07:56:25 [5247] host1 stonith-ng: error: >>> > stonith_peer_ais_destroy: >>> > AIS connection terminated >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: notice: stop_child: >>> > Stopping crmd: Sent -15 to process 5251 >>> > Feb 10 07:56:25 [5249] host1 attrd: error: >>> > attrd_cib_connection_destroy: Connection to the CIB terminated... >>> > Feb 10 07:56:25 [5251] host1 crmd: info: crm_signal_dispatch: >>> > Invoking handler for signal 15: Terminated >>> > Feb 10 07:56:25 [5251] host1 crmd: notice: crm_shutdown: >>> > Requesting shutdown, upper limit is 1200000ms >>> > Feb 10 07:56:25 [5251] host1 crmd: info: do_shutdown_req: >>> > Sending shutdown request to host2 >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: >>> > Child >>> > process stonith-ng exited (pid=5247, rc=1) >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: >>> > IPC >>> > Channel to 5249 is not connected >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: >>> > IPC >>> > Channel to 5246 is not connected >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: >>> > IPC >>> > Channel to 5247 is not connected >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: >>> > Sending message via cpg FAILED: (rc=9) Bad handle >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: >>> > Child >>> > process cib exited (pid=5246, rc=1) >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: >>> > Sending message via cpg FAILED: (rc=9) Bad handle >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: >>> > Child >>> > process attrd exited (pid=5249, rc=1) >>> > Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message: >>> > Sending message via cpg FAILED: (rc=9) Bad handle >>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ais_text: >>> > Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection >>> > timed >>> > out (110) >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: >>> > Input >>> > I_ERROR from do_shutdown_req() received in state S_NOT_DC >>> > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: >>> > State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR >>> > cause=C_FSA_INTERNAL >>> > origin=do_shutdown_req ] >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_recover: >>> > Action A_RECOVER (0000000001000000) not supported >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: >>> > Input >>> > I_TERMINATE from do_recover() received in state S_RECOVERY >>> > Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition: >>> > State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE >>> > cause=C_FSA_INTERNAL origin=do_recover ] >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_shutdown: >>> > Disconnecting STONITH... >>> > Feb 10 07:56:27 [5251] host1 crmd: info: >>> > tengine_stonith_connection_destroy: Fencing daemon disconnected >>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation >>> > monitor[25] >>> > on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its >>> > parameters: >>> > CRM_meta_name=[monitor] crm_feature_set=[3.0.6] >>> > CRM_meta_timeout=[20000] >>> > CRM_meta_interval=[5000] ip=[172.24.0.104] cancelled >>> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: >>> > Resource P_SESSION_IP was active at shutdown. You may ignore this >>> > error if >>> > it is unmanaged. >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_lrm_control: >>> > Disconnected from the LRM >>> > Feb 10 07:56:27 [5251] host1 crmd: notice: >>> > terminate_ais_connection: >>> > Disconnecting from AIS >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_ha_control: >>> > Disconnected from OpenAIS >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_cib_control: >>> > Disconnecting CIB >>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: >>> > IPC >>> > Channel to 5246 is not connected >>> > Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: >>> > IPC >>> > Channel to 5246 is not connected >>> > Feb 10 07:56:27 [5251] host1 crmd: error: >>> > cib_native_perform_op_delegate: Sending message to CIB service >>> > FAILED >>> > Feb 10 07:56:27 [5251] host1 crmd: info: >>> > crmd_cib_connection_destroy: Connection to the CIB terminated... >>> > Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped: >>> > Resource P_SESSION_IP was active at shutdown. You may ignore this >>> > error if >>> > it is unmanaged. >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: >>> > Performing >>> > A_EXIT_0 - gracefully exiting the CRMd >>> > Feb 10 07:56:27 [5251] host1 crmd: error: do_exit: Could >>> > not >>> > recover from internal error >>> > Feb 10 07:56:27 [5251] host1 crmd: info: free_mem: Dropping >>> > I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ] >>> > Feb 10 07:56:27 [5251] host1 crmd: info: crm_xml_cleanup: >>> > Cleaning up memory from libxml2 >>> > Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: [crmd] >>> > stopped (2) >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: pcmk_child_exit: >>> > Child >>> > process crmd exited (pid=5251, rc=2) >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: warning: send_ipc_message: >>> > IPC >>> > Channel to 5251 is not connected >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: >>> > Sending message via cpg FAILED: (rc=9) Bad handle >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: >>> > Stopping pengine: Sent -15 to process 5250 >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: >>> > Child >>> > process pengine exited (pid=5250, rc=0) >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: >>> > Sending message via cpg FAILED: (rc=9) Bad handle >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child: >>> > Stopping lrmd: Sent -15 to process 5248 >>> > Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: >>> > Child >>> > process lrmd exited (pid=5248, rc=0) >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message: >>> > Sending message via cpg FAILED: (rc=9) Bad handle >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: notice: >>> > pcmk_shutdown_worker: >>> > Shutdown complete >>> > Feb 10 07:56:27 [5242] host1 pacemakerd: info: main: Exiting >>> > pacemakerd >>> > >>> > >>> > corosync.conf: >>> > >>> > compatibility: whitetank >>> > >>> > totem { >>> > version: 2 >>> > secauth: off >>> > nodeid: 104 >>> > interface { >>> > member { >>> > memberaddr: 172.17.0.104 >>> > } >>> > member { >>> > memberaddr: 172.17.0.105 >>> > } >>> > ringnumber: 0 >>> > bindnetaddr: 172.17.0.0 >>> > mcastport: 5426 >>> > ttl: 1 >>> > } >>> > transport: udpu >>> > } >>> > >>> > logging { >>> > fileline: off >>> > to_logfile: yes >>> > to_syslog: yes >>> > debug: on >>> > logfile: /var/log/cluster/corosync.log >>> > debug: off >>> > timestamp: on >>> > logger_subsys { >>> > subsys: AMF >>> > debug: off >>> > } >>> > } >>> > service { >>> > # Load the Pacemaker Cluster Resource Manager >>> > ver: 1 >>> > name: pacemaker >>> > } >>> > >>> > aisexec { >>> > user: root >>> > group: root >>> > } >>> > >>> > >>> > >>> > Thank you! >>> > >>> > -- >>> > Viacheslav Biriukov >>> > BR >>> > http://biriukov.me >>> > >>> > _______________________________________________ >>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> > >>> > Project Home: http://www.clusterlabs.org >>> > Getting started: >>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> > Bugs: http://bugs.clusterlabs.org >>> > >>> >>> >>> >>> -- >>> Dan Frincu >>> CCNA, RHCE >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> >> -- >> Viacheslav Biriukov >> BR >> http://biriukov.me > > > > > -- > Viacheslav Biriukov > BR > http://biriukov.me > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org