Hello, friends. I have two-node cluster (nodes named 'obvo' as primary and 'back' as backup). I'm trying to make cluster to keep VLAN-interface with a number of IP-addresses. But problem is when resource (systemd unit) runs on 'obvo' it does cycled restarts until I stop resource. On 'back' this problem may occurs or may not - I still can't understand on what it depends.
Now more detailed. I've prepared netctl profile 'vlan431' which natively works fine on both nodes, and with any start method: - # netctl start vlan431 - # systemctl start netctl@vlan431. Lets check it: # systemctl status netctl@vlan431 * netctl@vlan431.service - Networking for netctl profile vlan431 Loaded: loaded (/usr/lib64/systemd/system/netctl@.service; static) Active: active (exited) since Mon 2014-09-29 14:09:28 MSK; 25s ago Docs: man:netctl.profile(5) Process: 20446 ExecStart=/usr/lib/network/network start %I (code=exited, status=0/SUCCESS) Main PID: 20446 (code=exited, status=0/SUCCESS) Sep 29 14:09:28 obvo network[20446]: Starting network profile 'vlan431'... Sep 29 14:09:28 obvo network[20446]: Started network profile 'vlan431' Sep 29 14:09:28 obvo systemd[1]: Started Networking for netctl profile vlan431. Now stop it: # netctl stop vlan431 Then I've added a resource to cluster: # crm configure show node 178256436: back \ attributes kernel=3.14.14-gentoo-20140821 node 178256439: obvo \ attributes kernel=3.14.14-gentoo-20140821 primitive ClusterIP IPaddr2 \ params ip=109.202.160.57 cidr_netmask=28 \ op monitor interval=30s \ meta target-role=Started primitive HTTPd systemd:lighttpd \ op monitor interval=30s \ meta target-role=Started primitive Vlan431 systemd:netctl@vlan431 \ op monitor interval=30s \ meta target-role=Started location location-HTTPd-obvo HTTPd 10: obvo location location-Vlan431-obvo Vlan431 10: obvo location location-ip-obvo ClusterIP 10: obvo property cib-bootstrap-options: \ dc-version=1.1.10-368c726 \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore and got cycled restarting of Vlan431 with producing core dump every time. What can I see: 1. #crm_mon -1 Last updated: Mon Sep 29 16:44:44 2014 Last change: Mon Sep 29 16:44:43 2014 by root via cibadmin on obvo Stack: corosync Current DC: obvo (178256439) - partition with quorum Version: 1.1.10-368c726 2 Nodes configured 3 Resources configured Online: [ back obvo ] ClusterIP (ocf::heartbeat:IPaddr2): Started obvo Vlan431 (systemd:netctl@vlan431): Started obvo FAILED HTTPd (systemd:lighttpd): Started obvo Failed actions: Vlan431_monitor_30000 (node=obvo, call=231, rc=7, status=complete, last-rc-change=Mon Sep 29 16:44:44 2014 , queued=35ms, exec=0ms ): not running Look, there is another systemd unit (lighttpd), which runs fine. 2. #journalctl (next block repeats until resource stop) Sep 29 16:45:03 obvo lrmd[282]: error: crm_abort: crm_glib_handler: Forked child 13573 to record non-fatal assert at logging.c:63 : g_error_free: assertion 'error != NULL' failed Sep 29 16:45:03 obvo systemd[1]: Starting Networking for netctl profile vlan431... Sep 29 16:45:03 obvo network[13574]: Starting network profile 'vlan431'... Sep 29 16:45:03 obvo crmd[285]: notice: process_lrm_event: LRM operation Vlan431_start_0 (call=745, rc=0, cib-update=482, confirmed=true) ok Sep 29 16:45:03 obvo crmd[285]: notice: te_rsc_command: Initiating action 2: monitor Vlan431_monitor_30000 on obvo (local) Sep 29 16:45:03 obvo systemd-sysctl[13583]: Overwriting earlier assignment of kernel/sysrq in file '/usr/lib64/sysctl.d/60-gentoo.conf'. Sep 29 16:45:03 obvo crmd[285]: notice: process_lrm_event: LRM operation Vlan431_monitor_30000 (call=748, rc=7, cib-update=483, confirmed=false) not running Sep 29 16:45:03 obvo crmd[285]: warning: status_from_rc: Action 2 (Vlan431_monitor_30000) on obvo failed (target: 0 vs. rc: 7): Error Sep 29 16:45:03 obvo crmd[285]: warning: update_failcount: Updating failcount for Vlan431 on obvo after failed monitor: rc=7 (update=value++, time=1411994703) Sep 29 16:45:03 obvo crmd[285]: warning: update_failcount: Updating failcount for Vlan431 on obvo after failed monitor: rc=7 (update=value++, time=1411994703) Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Vlan431 (129) Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update 406: fail-count-Vlan431=129 Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-Vlan431 (1411994703) Sep 29 16:45:03 obvo crmd[285]: notice: run_graph: Transition 181 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-161.bz2): Complete Sep 29 16:45:03 obvo pengine[284]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 29 16:45:03 obvo pengine[284]: warning: unpack_rsc_op: Processing failed op monitor for Vlan431 on obvo: not running (7) Sep 29 16:45:03 obvo pengine[284]: notice: LogActions: Recover Vlan431 (Started obvo) Sep 29 16:45:03 obvo pengine[284]: notice: process_pe_message: Calculated Transition 182: /var/lib/pacemaker/pengine/pe-input-162.bz2 Sep 29 16:45:03 obvo crmd[285]: notice: te_rsc_command: Initiating action 3: stop Vlan431_stop_0 on obvo (local) Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update 408: last-failure-Vlan431=1411994703 Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-Vlan431 (130) Sep 29 16:45:03 obvo systemd[1]: Started Networking for netctl profile vlan431. Sep 29 16:45:03 obvo systemd[1]: Reloading. Sep 29 16:45:03 obvo network[13574]: Started network profile 'vlan431' Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update 410: fail-count-Vlan431=130 Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-Vlan431 (1411994703) Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update 412: last-failure-Vlan431=1411994703 Sep 29 16:45:03 obvo lrmd[282]: error: crm_abort: crm_glib_handler: Forked child 13602 to record non-fatal assert at logging.c:63 : g_error_free: assertion 'error != NULL' failed Sep 29 16:45:03 obvo systemd[1]: Stopping Networking for netctl profile vlan431... Sep 29 16:45:03 obvo crmd[285]: notice: process_lrm_event: LRM operation Vlan431_stop_0 (call=752, rc=0, cib-update=485, confirmed=true) ok Sep 29 16:45:03 obvo crmd[285]: notice: run_graph: Transition 182 (Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-162.bz2): Stopped Sep 29 16:45:03 obvo pengine[284]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 29 16:45:03 obvo pengine[284]: warning: unpack_rsc_op: Processing failed op monitor for Vlan431 on obvo: not running (7) Sep 29 16:45:03 obvo pengine[284]: notice: LogActions: Start Vlan431 (obvo) Sep 29 16:45:03 obvo pengine[284]: notice: process_pe_message: Calculated Transition 183: /var/lib/pacemaker/pengine/pe-input-163.bz2 Sep 29 16:45:03 obvo crmd[285]: notice: te_rsc_command: Initiating action 9: start Vlan431_start_0 on obvo (local) Sep 29 16:45:03 obvo systemd[1]: Reloading. Sep 29 16:45:03 obvo network[13603]: Stopping network profile 'vlan431'... Sep 29 16:45:03 obvo network[13603]: Stopped network profile 'vlan431' Sep 29 16:45:03 obvo systemd[1]: Stopped Networking for netctl profile vlan431. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org