----- Original Message ----- > From: "Mathieu Peltier" <mathieu.pelt...@gmail.com> > To: pacemaker@oss.clusterlabs.org > Sent: Wednesday, November 6, 2013 11:27:50 AM > Subject: [Pacemaker] Cannot use ocf::heartbeat:IPsrcaddr (RTNETLINK answers: > No such process) > > Hi, > I am trying to set up a simple cluster of 2 machines on CentOS 6.4: > pacemaker-cli-1.1.10-1.el6_4.4.x86_64 > pacemaker-1.1.10-1.el6_4.4.x86_64 > pacemaker-libs-1.1.10-1.el6_4.4.x86_64 > pacemaker-cluster-libs-1.1.10-1.el6_4.4.x86_64 > corosync-1.4.1-15.el6_4.1.x86_64 > corosynclib-1.4.1-15.el6_4.1.x86_64 > pcs-0.9.90-1.el6_4.noarch > cman-3.0.12.1-49.el6_4.2.x86_64 > resource-agents-3.9.2-21.el6_4.8.x86_64 > > I am using the following script to configure the cluster: > -------------------------------------------------- > #!/bin/bash > > CLUSTER_NAME=test > CONFIG_FILE=/etc/cluster/cluster.conf > NODE1_EM1=node1 > NODE2_EM1=node2 > NODE1_EM2=node1-priv > NODE2_EM2=node2-priv > VIP=192.168.0.6 > MONITOR_INTERVAL=60s > > # Make sure that pacemaker is stopped on both nodes > # NOT INCLUDED HERE > > # Delete existing configuration > rm -rf /var/log/cluster/* > ssh root@$NODE2_EM2 'rm -rf /var/log/cluster/*' > rm -rf /var/lib/pacemaker/cib/* /var/lib/pacemaker/cores/* > /var/lib/pacemaker/pengine/* /var/lib/corosync/* /var/lib/cluster/* > ssh root@$NODE2_EM2 'rm -rf /var/lib/pacemaker/cib/* > /var/lib/pacemaker/cores/* /var/lib/pacemaker/pengine/* > /var/lib/corosync/* /var/lib/cluster/*' > > # Create the cluster > ccs -f $CONFIG_FILE --createcluster $CLUSTER_NAME > > # Add nodes to the cluster > ccs -f $CONFIG_FILE --addnode $NODE1_EM1 > ccs -f $CONFIG_FILE --addnode $NODE2_EM1 > ccs -f $CONFIG_FILE --setcman two_node="1" expected_votes="1" > > # Add alternative nodes name so that both network interfaces are used > ccs -f $CONFIG_FILE --addalt $NODE1_EM1 $NODE1_EM2 > ccs -f $CONFIG_FILE --addalt $NODE2_EM1 $NODE2_EM2 > ccs -f $CONFIG_FILE --setdlm protocol="sctp" > > # Teach CMAN how to send it's fencing requests to Pacemaker > ccs -f $CONFIG_FILE --addfencedev pcmk agent=fence_pcmk > ccs -f $CONFIG_FILE --addmethod pcmk-redirect $NODE1_EM1 > ccs -f $CONFIG_FILE --addmethod pcmk-redirect $NODE2_EM1 > ccs -f $CONFIG_FILE --addfenceinst pcmk $NODE1_EM1 pcmk-redirect > port=$NODE1_EM1 > ccs -f $CONFIG_FILE --addfenceinst pcmk $NODE2_EM1 pcmk-redirect > port=$NODE2_EM1 > > # Deploy configuration to node2 > scp /etc/cluster/cluster.conf root@$NODE2_EM2:/etc/cluster/cluster.conf > > # Start pacemaker on main node > /etc/init.d/pacemaker start > sleep 30 > > # Disable stonith > pcs property set stonith-enabled=false > > # Disable quorum > pcs property set no-quorum-policy=ignore > > # Define ressources > pcs resource create VIP_EM1 ocf:heartbeat:IPaddr params nic=em1 > ip=$VIP_EM1 cidr_netmask=24 op monitor interval=$MONITOR_INTERVAL > pcs resource create PREFERRED_SRC_IP ocf:heartbeat:IPsrcaddr params > ipaddress=$VIP_EM1 op monitor interval=$MONITOR_INTERVAL > > # Define initial location and prevent ressources to go back to initial > server after a failure > pcs resource defaults resource-stickiness=100 > pcs constraint location VIP_EM1 prefers $NODE1_EM1=50 > -------------------------------------------------- > > After running this script from node1: > > root@node1# pcs status > Cluster name: > Last updated: Wed Nov 6 17:17:30 2013 > Last change: Wed Nov 6 17:06:20 2013 via crm_attribute on node1 > Stack: cman > Current DC: node1 - partition with quorum > Version: 1.1.10-1.el6_4.4-368c726 > 2 Nodes configured > 2 Resources configured > > Online: [ node1 ] > OFFLINE: [ node2 ] > > Full list of resources: > > VIP_EM1 (ocf::heartbeat:IPaddr): Stopped > PREFERRED_SRC_IP (ocf::heartbeat:IPsrcaddr): Stopped > > Failed actions: > PREFERRED_SRC_IP_start_0 on node1 'unknown error' (1): call=19, > status=complete, last-rc-change='Wed Nov 6 17:06:20 2013', > queued=67ms, exec=0ms > > PCSD Status: > Error: no nodes found in corosync.conf > > root@node1# ip route show > 192.168.8.0/24 dev em2 proto kernel scope link src 192.168.8.1 > default via 192.168.0.1 dev em1 > > Error in /var/log/cluster/corosync.log: > ... > IPsrcaddr(PREFERRED_SRC_IP)[638]: 2013/11/06_16:50:32 ERROR: > command 'ip route change to default via 192.168.0.1 dev em1 src > 192.168.0.6' failed > Nov 06 16:50:32 [32461] node1.domain.org lrmd: notice: > operation_finished: PREFERRED_SRC_IP_start_0:638:stderr [ > RTNETLINK answers: No such process ] > ... > > If I run the command manually when pacemaker is not started (after > rebooting the machine for example), the default route is modified as > expected (I use 192.168.0.106 because the alias 192.168.0.6 is not > started) > > # ip route show > 192.168.0.0/24 dev em1 proto kernel scope link src 192.168.0.106 > 192.168.8.0/24 dev em3 proto kernel scope link src 192.168.8.1 > default via 192.168.0.1 dev em1 > > # ip route change to default via 192.168.0.1 dev em1 src 192.168.0.106 > > # ip route show > 192.168.0.0/24 dev em1 proto kernel scope link src 192.168.0.106 > 192.168.8.0/24 dev em3 proto kernel scope link src 192.168.8.1 > default via 192.168.0.1 dev em1 src 192.168.0.106 > > If I run the same configure script without defining the > PREFERRED_SRC_IP resource, I can check that the resource is started as > expected: > > # pcs status > ... > Online: [ node1 ] > OFFLINE: [ node2 ] > > Full list of resources: > VIP_EM1 (ocf::heartbeat:IPaddr): Started node1 > ... > > # ip addr show em1 > 6: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen > 1000 > link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff > inet 192.168.0.106/24 brd 192.168.0.255 scope global em1 > inet 192.168.0.6/24 brd 192.168.0.255 scope global secondary em1 > > But when I create the PREFERRED_SRC_IP resource, I get the same error: > > # pcs resource create PREFERRED_SRC_IP ocf:heartbeat:IPsrcaddr params > ipaddress=192.168.0.6 op monitor interval=60s
I noticed you didn't create a order constraint between the IPaddr and the IPsrcaddr resources. You'll want to guarantee the IP address starts before setting it as the IPsrcaddr. pcs constraint order VIP_EM1 then PREFERRED_SRC_IP If that doesn't help anything, we'll need some debug information. After defining the src ip and watching it fail, run this and provide the debug info it provides. crm_resource -r PREFERRED_SRC_IP --force-start -VV Thanks, -- Vossel > > # pcs status > ... > Online: [ node1 ] > OFFLINE: [ node2 ] > > Full list of resources: > VIP_EM1 (ocf::heartbeat:IPaddr): Started node1 > PREFERRED_SRC_IP (ocf::heartbeat:IPsrcaddr): Stopped > > Failed actions: > PREFERRED_SRC_IP_start_0 on node1 'unknown error' (1): call=24, > status=complete, last-rc-change='Wed Nov 6 18:00:09 2013', > queued=47ms, exec=0ms > > Error in corosync.log: > > IPsrcaddr(PREFERRED_SRC_IP)[10035]: 2013/11/06_18:00:09 ERROR: > command 'ip route change to default via 192.168.0.1 dev em1 src > 192.168.0.6' failed > Nov 06 18:00:09 [9172] node1.domain.org lrmd: notice: > operation_finished: PREFERRED_SRC_IP_start_0:10035:stderr [ > RTNETLINK answers: No such process ] > > Thanks in advance, > Mathieu > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org