On Wed, Jun 20, 2012 at 11:51 PM, emmanuel segura <emi2f...@gmail.com> wrote: > Hello > > Why you say there is not error in the message
Because it doesn't say "error" anywhere? The logs below look completely normal for a node thats just joined the cluster. > ========================================================= > > Jun 20 11:57:25 atlas4 lrmd: [17568]: info: operation monitor[35] on lx0 > for client 17571: pid 30179 exited with return code 7 > Jun 20 11:57:25 atlas4 crmd: [17571]: debug: create_operation_update: > do_update_resource: Updating resouce lx0 after complete monitor op > (interval=0) > Jun 20 11:57:25 atlas4 crmd: [17571]: info: process_lrm_event: LRM > operation lx0_monitor_0 (call=35, rc=7, cib-update=61, confirmed=true) not > running > ========================================================= > > > 2012/6/20 Kadlecsik József <kadlecsik.joz...@wigner.mta.hu> >> >> Hello, >> >> Somehow a VirtualDomain resource after a "crm resource restart", which did >> *not* start the resource but just stop, the resource cannot be started >> anymore. The most baffling is that I do not see an error message. The >> resource in question, named 'lx0', can be started directly via >> virsh/libvirt and libvirtd is running on all cluster nodes. >> >> We run corosync 1.4.2-1~bpo60+1, pacemaker 1.1.6-2~bpo60+1 (debian). >> >> # crm status >> ============ >> Last updated: Wed Jun 20 15:14:44 2012 >> Last change: Wed Jun 20 14:07:40 2012 via cibadmin on atlas0 >> Stack: openais >> Current DC: atlas0 - partition with quorum >> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c >> 7 Nodes configured, 7 expected votes >> 18 Resources configured. >> ============ >> >> Online: [ atlas0 atlas1 atlas2 atlas3 atlas4 atlas5 atlas6 ] >> >> kerberos (ocf::heartbeat:VirtualDomain): Started atlas0 >> stonith-atlas3 (stonith:ipmilan): Started atlas4 >> stonith-atlas1 (stonith:ipmilan): Started atlas4 >> stonith-atlas2 (stonith:ipmilan): Started atlas4 >> stonith-atlas0 (stonith:ipmilan): Started atlas4 >> stonith-atlas4 (stonith:ipmilan): Started atlas3 >> mailman (ocf::heartbeat:VirtualDomain): Started atlas6 >> indico (ocf::heartbeat:VirtualDomain): Started atlas0 >> papi (ocf::heartbeat:VirtualDomain): Started atlas1 >> wwwd (ocf::heartbeat:VirtualDomain): Started atlas2 >> webauth (ocf::heartbeat:VirtualDomain): Started atlas3 >> caladan (ocf::heartbeat:VirtualDomain): Started atlas4 >> radius (ocf::heartbeat:VirtualDomain): Started atlas5 >> mail0 (ocf::heartbeat:VirtualDomain): Started atlas6 >> stonith-atlas5 (stonith:apcmastersnmp): Started atlas4 >> stonith-atlas6 (stonith:apcmastersnmp): Started atlas4 >> w0 (ocf::heartbeat:VirtualDomain): Started atlas2 >> >> # crm resource show >> kerberos (ocf::heartbeat:VirtualDomain) Started >> stonith-atlas3 (stonith:ipmilan) Started >> stonith-atlas1 (stonith:ipmilan) Started >> stonith-atlas2 (stonith:ipmilan) Started >> stonith-atlas0 (stonith:ipmilan) Started >> stonith-atlas4 (stonith:ipmilan) Started >> mailman (ocf::heartbeat:VirtualDomain) Started >> indico (ocf::heartbeat:VirtualDomain) Started >> papi (ocf::heartbeat:VirtualDomain) Started >> wwwd (ocf::heartbeat:VirtualDomain) Started >> webauth (ocf::heartbeat:VirtualDomain) Started >> caladan (ocf::heartbeat:VirtualDomain) Started >> radius (ocf::heartbeat:VirtualDomain) Started >> mail0 (ocf::heartbeat:VirtualDomain) Started >> stonith-atlas5 (stonith:apcmastersnmp) Started >> stonith-atlas6 (stonith:apcmastersnmp) Started >> w0 (ocf::heartbeat:VirtualDomain) Started >> lx0 (ocf::heartbeat:VirtualDomain) Stopped >> >> # crm configure show >> node atlas0 \ >> attributes standby="false" \ >> utilization memory="24576" >> node atlas1 \ >> attributes standby="false" \ >> utilization memory="24576" >> node atlas2 \ >> attributes standby="false" \ >> utilization memory="24576" >> node atlas3 \ >> attributes standby="false" \ >> utilization memory="24576" >> node atlas4 \ >> attributes standby="false" \ >> utilization memory="24576" >> node atlas5 \ >> attributes standby="off" \ >> utilization memory="20480" >> node atlas6 \ >> attributes standby="off" \ >> utilization memory="20480" >> primitive caladan ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/caladan.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="4608" >> primitive indico ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/indico.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="5120" >> primitive kerberos ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/qemu/kerberos.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="4608" >> primitive lx0 ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/lx0.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="4608" >> primitive mail0 ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/mail0.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="4608" >> primitive mailman ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/mailman.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="5120" >> primitive papi ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/papi.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="6144" >> primitive radius ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/radius.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="4608" >> primitive stonith-atlas0 stonith:ipmilan \ >> params hostname="atlas0" ipaddr="192.168.40.20" port="623" >> auth="md5" priv="admin" login="root" password="XXXXX" \ >> op start interval="0" timeout="120s" \ >> meta target-role="Started" >> primitive stonith-atlas1 stonith:ipmilan \ >> params hostname="atlas1" ipaddr="192.168.40.21" port="623" >> auth="md5" priv="admin" login="root" password="XXXX" \ >> op start interval="0" timeout="120s" \ >> meta target-role="Started" >> primitive stonith-atlas2 stonith:ipmilan \ >> params hostname="atlas2" ipaddr="192.168.40.22" port="623" >> auth="md5" priv="admin" login="root" password="XXXX" \ >> op start interval="0" timeout="120s" \ >> meta target-role="Started" >> primitive stonith-atlas3 stonith:ipmilan \ >> params hostname="atlas3" ipaddr="192.168.40.23" port="623" >> auth="md5" priv="admin" login="root" password="XXXX" \ >> op start interval="0" timeout="120s" \ >> meta target-role="Started" >> primitive stonith-atlas4 stonith:ipmilan \ >> params hostname="atlas4" ipaddr="192.168.40.24" port="623" >> auth="md5" priv="admin" login="root" password="XXXX" \ >> op start interval="0" timeout="120s" \ >> meta target-role="Started" >> primitive stonith-atlas5 stonith:apcmastersnmp \ >> params ipaddr="192.168.40.252" port="161" community="XXXX" >> pcmk_host_list="atlas5" pcmk_host_check="static-list" >> primitive stonith-atlas6 stonith:apcmastersnmp \ >> params ipaddr="192.168.40.252" port="161" community="XXXX" >> pcmk_host_list="atlas6" pcmk_host_check="static-list" >> primitive w0 ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/w0.xml" hypervisor="qemu:///system" >> \ >> meta allow-migrate="true" target-role="Started" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="4608" >> primitive webauth ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/webauth.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="4608" >> primitive wwwd ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/crm/wwwd.xml" >> hypervisor="qemu:///system" \ >> meta allow-migrate="true" target-role="Started" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="40s" depth="0" \ >> op migrate_to interval="0" timeout="240s" on-fail="block" \ >> op migrate_from interval="0" timeout="240s" on-fail="block" \ >> utilization memory="5120" >> location location-stonith-atlas0 stonith-atlas0 -inf: atlas0 >> location location-stonith-atlas1 stonith-atlas1 -inf: atlas1 >> location location-stonith-atlas2 stonith-atlas2 -inf: atlas2 >> location location-stonith-atlas3 stonith-atlas3 -inf: atlas3 >> location location-stonith-atlas4 stonith-atlas4 -inf: atlas4 >> location location-stonith-atlas5 stonith-atlas5 -inf: atlas5 >> location location-stonith-atlas6 stonith-atlas6 -inf: atlas6 >> property $id="cib-bootstrap-options" \ >> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ >> cluster-infrastructure="openais" \ >> expected-quorum-votes="7" \ >> stonith-enabled="true" \ >> no-quorum-policy="stop" \ >> last-lrm-refresh="1340193431" \ >> symmetric-cluster="true" \ >> maintenance-mode="false" \ >> stop-all-resources="false" \ >> is-managed-default="true" \ >> placement-strategy="balanced" >> >> # crm_verify -L -VV >> [...] >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave w0 >> (Started atlas2) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave >> stonith-atlas6 (Started atlas4) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave >> stonith-atlas5 (Started atlas4) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave >> stonith-atlas4 (Started atlas3) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave >> stonith-atlas3 (Started atlas4) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave >> stonith-atlas2 (Started atlas4) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave >> stonith-atlas1 (Started atlas4) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Leave >> stonith-atlas0 (Started atlas4) >> crm_verify[19320]: 2012/06/20_15:25:50 notice: LogActions: Start lx0 >> (atlas4) >> >> I have tried to delete the resource and add again, did not help. >> The corresponding log entries: >> >> Jun 20 11:57:25 atlas4 crmd: [17571]: info: delete_resource: Removing >> resource lx0 for 28654_crm_resource (internal) on atlas0 >> Jun 20 11:57:25 atlas4 lrmd: [17568]: debug: lrmd_rsc_destroy: removing >> resource lx0 >> Jun 20 11:57:25 atlas4 crmd: [17571]: debug: delete_rsc_entry: sync: >> Sending delete op for lx0 >> Jun 20 11:57:25 atlas4 crmd: [17571]: info: notify_deleted: Notifying >> 28654_crm_resource on atlas0 that lx0 was deleted >> Jun 20 11:57:25 atlas4 crmd: [17571]: WARN: decode_transition_key: Bad >> UUID (crm-resource-28654) in sscanf result (3) for 0:0:crm-resource-28654 >> Jun 20 11:57:25 atlas4 crmd: [17571]: debug: create_operation_update: >> send_direct_ack: Updating resouce lx0 after complete delete op >> (interval=60000) >> Jun 20 11:57:25 atlas4 crmd: [17571]: info: send_direct_ack: ACK'ing >> resource op lx0_delete_60000 from 0:0:crm-resource-28654: >> lrm_invoke-lrmd-1340186245-16 >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] mcasted message added >> to pending queue >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] mcasted message added >> to pending queue >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering 10d5 to 10d7 >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering MCAST >> message with seq 10d6 to pending delivery queue >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering MCAST >> message with seq 10d7 to pending delivery queue >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Received >> ringid(192.168.40.60:22264) seq 10d6 >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Received >> ringid(192.168.40.60:22264) seq 10d7 >> Jun 20 11:57:25 atlas4 crmd: [17571]: debug: notify_deleted: Triggering a >> refresh after 28654_crm_resource deleted lx0 from the LRM >> Jun 20 11:57:25 atlas4 cib: [17567]: debug: cib_process_xpath: Processing >> cib_query op for >> >> //cib/configuration/crm_config//cluster_property_set//nvpair[@name='last-lrm-refresh'] >> (/cib/configuration/crm_config/cluster_property_set/nvpair[6]) >> >> >> Jun 20 11:57:25 atlas4 lrmd: [17568]: debug: on_msg_add_rsc:client [17571] >> adds resource lx0 >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering 149e to 149f >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering MCAST >> message with seq 149f to pending delivery queue >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Received >> ringid(192.168.40.60:22264) seq 14a0 >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering 149f to 14a0 >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] Delivering MCAST >> message with seq 14a0 to pending delivery queue >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] releasing messages up >> to and including 149e >> Jun 20 11:57:25 atlas4 crmd: [17571]: info: do_lrm_rsc_op: Performing >> key=26:10266:7:e7426ec7-3bae-4a4b-a4ae-c3f80f17e058 op=lx0_monitor_0 ) >> Jun 20 11:57:25 atlas4 lrmd: [17568]: debug: on_msg_perform_op:2396: >> copying parameters for rsc lx0 >> Jun 20 11:57:25 atlas4 lrmd: [17568]: debug: on_msg_perform_op: add an >> operation operation monitor[35] on lx0 for client 17571, its parameters: >> crm_feature_set=[3.0.5] config=[/etc/libvirt/crm/lx0.xml] >> CRM_meta_timeout=[20000] hypervisor=[qemu:///system] to the operation >> list. >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] releasing messages up >> to and including 149f >> Jun 20 11:57:25 atlas4 lrmd: [17568]: info: rsc:lx0 probe[35] (pid 30179) >> Jun 20 11:57:25 atlas4 VirtualDomain[30179]: INFO: Domain name "lx0" saved >> to /var/run/resource-agents/VirtualDomain-lx0.state. >> Jun 20 11:57:25 atlas4 corosync[17530]: [TOTEM ] releasing messages up >> to and including 14bc >> Jun 20 11:57:25 atlas4 VirtualDomain[30179]: DEBUG: Virtual domain lx0 is >> currently shut off. >> Jun 20 11:57:25 atlas4 lrmd: [17568]: WARN: Managed lx0:monitor process >> 30179 exited with return code 7. >> Jun 20 11:57:25 atlas4 lrmd: [17568]: info: operation monitor[35] on lx0 >> for client 17571: pid 30179 exited with return code 7 >> Jun 20 11:57:25 atlas4 crmd: [17571]: debug: create_operation_update: >> do_update_resource: Updating resouce lx0 after complete monitor op >> (interval=0) >> Jun 20 11:57:25 atlas4 crmd: [17571]: info: process_lrm_event: LRM >> operation lx0_monitor_0 (call=35, rc=7, cib-update=61, confirmed=true) not >> running >> Jun 20 11:57:25 atlas4 crmd: [17571]: debug: update_history_cache: >> Appending monitor op to history for 'lx0' >> Jun 20 11:57:25 atlas4 crmd: [17571]: debug: get_xpath_object: No match >> for //cib_update_result//diff-added//crm_config in >> /notify/cib_update_result/diff >> >> What can be wrong in the setup/configuration? And what on the earth >> happened? >> >> Best regards, >> Jozsef >> -- >> E-mail : kadlecsik.joz...@wigner.mta.hu >> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt >> Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences >> H-1525 Budapest 114, POB. 49, Hungary >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org