[Pacemaker] Resource Netctl - systemd unit - cycled restarting

Dmitry Pozdeiev Mon, 29 Sep 2014 07:26:56 -0700

Hello, friends.

I have two-node cluster (nodes named 'obvo' as primary and 'back' as
backup). I'm trying to make cluster to keep VLAN-interface with a number of
IP-addresses. But problem is when resource (systemd unit) runs on 'obvo' it
does cycled restarts until I stop resource. On 'back' this problem may
occurs or may not - I still can't understand on what it depends.


Now more detailed.

I've prepared netctl profile 'vlan431' which natively works fine on both
nodes, and with any start method:
- # netctl start vlan431
- # systemctl start netctl@vlan431.
Lets check it:
# systemctl status netctl@vlan431
* netctl@vlan431.service - Networking for netctl profile vlan431
   Loaded: loaded (/usr/lib64/systemd/system/netctl@.service; static)
   Active: active (exited) since Mon 2014-09-29 14:09:28 MSK; 25s ago
     Docs: man:netctl.profile(5)
  Process: 20446 ExecStart=/usr/lib/network/network start %I (code=exited,
status=0/SUCCESS)
 Main PID: 20446 (code=exited, status=0/SUCCESS)
Sep 29 14:09:28 obvo network[20446]: Starting network profile 'vlan431'...
Sep 29 14:09:28 obvo network[20446]: Started network profile 'vlan431'
Sep 29 14:09:28 obvo systemd[1]: Started Networking for netctl profile vlan431.

Now stop it:
# netctl stop vlan431

Then I've added a resource to cluster:
# crm configure show
node 178256436: back \
        attributes kernel=3.14.14-gentoo-20140821
node 178256439: obvo \
        attributes kernel=3.14.14-gentoo-20140821
primitive ClusterIP IPaddr2 \
        params ip=109.202.160.57 cidr_netmask=28 \
        op monitor interval=30s \
        meta target-role=Started
primitive HTTPd systemd:lighttpd \
        op monitor interval=30s \
        meta target-role=Started
primitive Vlan431 systemd:netctl@vlan431 \
        op monitor interval=30s \
        meta target-role=Started
location location-HTTPd-obvo HTTPd 10: obvo
location location-Vlan431-obvo Vlan431 10: obvo
location location-ip-obvo ClusterIP 10: obvo
property cib-bootstrap-options: \
        dc-version=1.1.10-368c726 \
        cluster-infrastructure=corosync \
        stonith-enabled=false \
        no-quorum-policy=ignore

and got cycled restarting of Vlan431 with producing core dump every time.
What can I see:
1. #crm_mon -1
Last updated: Mon Sep 29 16:44:44 2014
Last change: Mon Sep 29 16:44:43 2014 by root via cibadmin on obvo
Stack: corosync
Current DC: obvo (178256439) - partition with quorum
Version: 1.1.10-368c726
2 Nodes configured
3 Resources configured

Online: [ back obvo ]

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started obvo
 Vlan431        (systemd:netctl@vlan431):       Started obvo FAILED
 HTTPd  (systemd:lighttpd):     Started obvo

Failed actions:
    Vlan431_monitor_30000 (node=obvo, call=231, rc=7, status=complete,
last-rc-change=Mon Sep 29 16:44:44 2014
, queued=35ms, exec=0ms
): not running

Look, there is another systemd unit (lighttpd), which runs fine.

2. #journalctl (next block repeats until resource stop)
Sep 29 16:45:03 obvo lrmd[282]: error: crm_abort: crm_glib_handler: Forked
child 13573 to record non-fatal assert at logging.c:63 : g_error_free:
assertion 'error != NULL' failed
Sep 29 16:45:03 obvo systemd[1]: Starting Networking for netctl profile
vlan431...
Sep 29 16:45:03 obvo network[13574]: Starting network profile 'vlan431'...
Sep 29 16:45:03 obvo crmd[285]: notice: process_lrm_event: LRM operation
Vlan431_start_0 (call=745, rc=0, cib-update=482, confirmed=true) ok
Sep 29 16:45:03 obvo crmd[285]: notice: te_rsc_command: Initiating action 2:
monitor Vlan431_monitor_30000 on obvo (local)
Sep 29 16:45:03 obvo systemd-sysctl[13583]: Overwriting earlier assignment
of kernel/sysrq in file '/usr/lib64/sysctl.d/60-gentoo.conf'.
Sep 29 16:45:03 obvo crmd[285]: notice: process_lrm_event: LRM operation
Vlan431_monitor_30000 (call=748, rc=7, cib-update=483, confirmed=false) not
running
Sep 29 16:45:03 obvo crmd[285]: warning: status_from_rc: Action 2
(Vlan431_monitor_30000) on obvo failed (target: 0 vs. rc: 7): Error
Sep 29 16:45:03 obvo crmd[285]: warning: update_failcount: Updating
failcount for Vlan431 on obvo after failed monitor: rc=7 (update=value++,
time=1411994703)
Sep 29 16:45:03 obvo crmd[285]: warning: update_failcount: Updating
failcount for Vlan431 on obvo after failed monitor: rc=7 (update=value++,
time=1411994703)
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush
op to all hosts for: fail-count-Vlan431 (129)
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update
406: fail-count-Vlan431=129
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush
op to all hosts for: last-failure-Vlan431 (1411994703)
Sep 29 16:45:03 obvo crmd[285]: notice: run_graph: Transition 181
(Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-161.bz2): Complete
Sep 29 16:45:03 obvo pengine[284]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Sep 29 16:45:03 obvo pengine[284]: warning: unpack_rsc_op: Processing failed
op monitor for Vlan431 on obvo: not running (7)
Sep 29 16:45:03 obvo pengine[284]: notice: LogActions: Recover Vlan431     
  (Started obvo)
Sep 29 16:45:03 obvo pengine[284]: notice: process_pe_message: Calculated
Transition 182: /var/lib/pacemaker/pengine/pe-input-162.bz2
Sep 29 16:45:03 obvo crmd[285]: notice: te_rsc_command: Initiating action 3:
stop Vlan431_stop_0 on obvo (local)
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update
408: last-failure-Vlan431=1411994703
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush
op to all hosts for: fail-count-Vlan431 (130)
Sep 29 16:45:03 obvo systemd[1]: Started Networking for netctl profile vlan431.
Sep 29 16:45:03 obvo systemd[1]: Reloading.
Sep 29 16:45:03 obvo network[13574]: Started network profile 'vlan431'
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update
410: fail-count-Vlan431=130
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_trigger_update: Sending flush
op to all hosts for: last-failure-Vlan431 (1411994703)
Sep 29 16:45:03 obvo attrd[283]: notice: attrd_perform_update: Sent update
412: last-failure-Vlan431=1411994703
Sep 29 16:45:03 obvo lrmd[282]: error: crm_abort: crm_glib_handler: Forked
child 13602 to record non-fatal assert at logging.c:63 : g_error_free:
assertion 'error != NULL' failed
Sep 29 16:45:03 obvo systemd[1]: Stopping Networking for netctl profile
vlan431...
Sep 29 16:45:03 obvo crmd[285]: notice: process_lrm_event: LRM operation
Vlan431_stop_0 (call=752, rc=0, cib-update=485, confirmed=true) ok
Sep 29 16:45:03 obvo crmd[285]: notice: run_graph: Transition 182
(Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-162.bz2): Stopped
Sep 29 16:45:03 obvo pengine[284]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Sep 29 16:45:03 obvo pengine[284]: warning: unpack_rsc_op: Processing failed
op monitor for Vlan431 on obvo: not running (7)
Sep 29 16:45:03 obvo pengine[284]: notice: LogActions: Start   Vlan431     
  (obvo)
Sep 29 16:45:03 obvo pengine[284]: notice: process_pe_message: Calculated
Transition 183: /var/lib/pacemaker/pengine/pe-input-163.bz2
Sep 29 16:45:03 obvo crmd[285]: notice: te_rsc_command: Initiating action 9:
start Vlan431_start_0 on obvo (local)
Sep 29 16:45:03 obvo systemd[1]: Reloading.
Sep 29 16:45:03 obvo network[13603]: Stopping network profile 'vlan431'...
Sep 29 16:45:03 obvo network[13603]: Stopped network profile 'vlan431'
Sep 29 16:45:03 obvo systemd[1]: Stopped Networking for netctl profile vlan431.


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Resource Netctl - systemd unit - cycled restarting

Reply via email to