I may have missed it but have you tried the MySQL RA rather than the init script? I've had more success with it.
-----Original Message----- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of mike Sent: 30 March 2010 15:42 To: General Linux-HA mailing list Subject: Re: [Linux-HA] Why does mysld start run again? Thank you Dejan, I tried changing the script so that instead of requiring a "report" it now takes status. Specifically I changed it from this: report' ) "$mysqld_multi" report $2 ;; to this status' ) "$mysqld_multi" report $2 ;; I was hoping this would return a proper status and allow a failover. The messages disappeared in the log file so that was a good start. When I killed mysql on the primary node however there was no failover and crm_mon on both nodes seemed to indicate that mysql was still alive on the primary node. I grabbed this from my log file: Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: info: determine_online_status: Node dbsuat1b.intranet.mydomain.com is online Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: notice: unpack_rsc_op: Operation mysqld_2_monitor_0 found resource mysqld_2 active on dbsuat1b.intranet.mydomain.com Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: info: determine_online_status: Node dbsuat1a.intranet.mydomain.com is online Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: notice: unpack_rsc_op: Operation mysqld_2_monitor_0 found resource mysqld_2 active on dbsuat1a.intranet.mydomain.com Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: notice: group_print: Resource Group: group_1 Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: notice: native_print: IPaddr2_1 (ocf::heartbeat:IPaddr2): Started dbsuat1a.intranet.mydomain.com Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: notice: native_print: mysqld_2 (lsb:mysqld): Started dbsuat1a.intranet.mydomain.com Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: notice: LogActions: Leave resource IPaddr2_1 (Started dbsuat1a.intranet.mydomain.com) Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: notice: LogActions: Leave resource mysqld_2 (Started dbsuat1a.intranet.mydomain.com) Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: info: process_pe_message: Transition 7: PEngine Input stored in: /usr/var/lib/pengine/pe-input-801.bz2 Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com crmd: [3300]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Mar 30 10:20:31 DBSUAT1A.intranet.mydomain.com pengine: [15123]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. Any ideas? Dejan Muhamedagic wrote: > Hi, > > On Tue, Mar 30, 2010 at 10:24:59AM -0300, mike wrote: > >> Also noticed another oddity. I killed mysql on the primary node fully >> expecting it to either trigger a failover or a restart of mysql on the >> primary node; I wasn't 100% sure which. Well, nothing happened. I do >> however see a number of messages like this in the ha-log: >> >> Mar 30 08:59:27 DBSUAT1A.intranet.mydomain.com lrmd: [3297]: info: RA >> output: (mysqld_2:monitor:stderr) Usage: /etc/init.d/mysqld >> {start|stop|report|restart} >> > > Looks like the script doesn't support the status action. If so, > then it can't be used in a cluster. > > Thanks, > > Dejan > > >> mike wrote: >> >>> Thanks for the reply Florian. >>> I installed from tar ball so am a little unsure of the releases but >>> looking at the READMEs I see this >>> heartbeat-3.0.2 >>> Pacemaker-1-0-17 (I think) >>> >>> They are all fairly recent, I downloaded them fro hg.linux-ha.org about >>> 3 months ago. If you know of a file I can check to be 100% sure of the >>> version # let me know. >>> Here's my configuration: >>> cib.xml: >>> <cib admin_epoch="0" epoch="9" validate-with="transitional-0.6" >>> crm_feature_set="3.0.1" have-quorum="1" num_updates="25" >>> cib-last-written="Mo >>> n Mar 29 21:55:01 2010" dc-uuid="e99889ee-da15-4b09-bfc7-641e3ac0687f"> >>> <configuration> >>> <crm_config> >>> <cluster_property_set id="cib-bootstrap-options"> >>> <attributes> >>> <nvpair id="cib-bootstrap-options-symmetric-cluster" >>> name="symmetric-cluster" value="true"/> >>> <nvpair id="cib-bootstrap-options-no-quorum-policy" >>> name="no-quorum-policy" value="stop"/> >>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" >>> name="default-resource-stickiness" value="0"/> >>> <nvpair >>> id="cib-bootstrap-options-default-resource-failure-stickiness" >>> name="default-resource-failure-stickiness" value="0"/> >>> <nvpair id="cib-bootstrap-options-stonith-enabled" >>> name="stonith-enabled" value="false"/> >>> <nvpair id="cib-bootstrap-options-stonith-action" >>> name="stonith-action" value="reboot"/> >>> <nvpair id="cib-bootstrap-options-startup-fencing" >>> name="startup-fencing" value="true"/> >>> <nvpair id="cib-bootstrap-options-stop-orphan-resources" >>> name="stop-orphan-resources" value="true"/> >>> <nvpair id="cib-bootstrap-options-stop-orphan-actions" >>> name="stop-orphan-actions" value="true"/> >>> <nvpair id="cib-bootstrap-options-remove-after-stop" >>> name="remove-after-stop" value="false"/> >>> <nvpair id="cib-bootstrap-options-short-resource-names" >>> name="short-resource-names" value="true"/> >>> <nvpair id="cib-bootstrap-options-transition-idle-timeout" >>> name="transition-idle-timeout" value="5min"/> >>> <nvpair id="cib-bootstrap-options-default-action-timeout" >>> name="default-action-timeout" value="20s"/> >>> <nvpair id="cib-bootstrap-options-is-managed-default" >>> name="is-managed-default" value="true"/> >>> <nvpair id="cib-bootstrap-options-cluster-delay" >>> name="cluster-delay" value="60s"/> >>> <nvpair id="cib-bootstrap-options-pe-error-series-max" >>> name="pe-error-series-max" value="-1"/> >>> <nvpair id="cib-bootstrap-options-pe-warn-series-max" >>> name="pe-warn-series-max" value="-1"/> >>> <nvpair id="cib-bootstrap-options-pe-input-series-max" >>> name="pe-input-series-max" value="-1"/> >>> <nvpair id="cib-bootstrap-options-dc-version" >>> name="dc-version" value="1.0.6-17fe0022afda074a937d934b3eb625eccd1f01ef"/> >>> <nvpair id="cib-bootstrap-options-cluster-infrastructure" >>> name="cluster-infrastructure" value="Heartbeat"/> >>> </attributes> >>> </cluster_property_set> >>> </crm_config> >>> <nodes> >>> <node id="e99889ee-da15-4b09-bfc7-641e3ac0687f" >>> uname="dbsuat1b.intranet.mydomain.com" type="normal"/> >>> <node id="db80324b-c9de-4995-a66a-eedf93abb42c" >>> uname="dbsuat1a.intranet.mydomain.com" type="normal"/> >>> </nodes> >>> <resources> >>> <group id="group_1"> >>> <primitive class="ocf" id="IPaddr2_1" provider="heartbeat" >>> type="IPaddr2"> >>> <operations> >>> <op id="IPaddr2_1_mon" interval="5s" name="monitor" >>> timeout="5s"/> >>> </operations> >>> <instance_attributes id="IPaddr2_1_inst_attr"> >>> <attributes> >>> <nvpair id="IPaddr2_1_attr_0" name="ip" >>> value="172.28.185.49"/> >>> </attributes> >>> </instance_attributes> >>> </primitive> >>> <primitive class="lsb" id="mysqld_2" provider="heartbeat" >>> type="mysqld"> >>> <operations> >>> <op id="mysqld_2_mon" interval="120s" name="monitor" >>> timeout="60s"/> >>> </operations> >>> </primitive> >>> </group> >>> </resources> >>> <constraints> >>> <rsc_location id="rsc_location_group_1" rsc="group_1"> >>> <rule id="prefered_location_group_1" score="100"> >>> <expression attribute="#uname" >>> id="prefered_location_group_1_expr" operation="eq" >>> value="DBSUAT1A.intranet.mydomain.com"/> >>> </rule> >>> </rsc_location> >>> </constraints> >>> </configuration> >>> <status> >>> <node_state id="e99889ee-da15-4b09-bfc7-641e3ac0687f" >>> uname="dbsuat1b.intranet.mydomain.com" ha="active" in_ccm="true" >>> crmd="online" join >>> ="member" expected="member" crm-debug-origin="do_update_resource" >>> shutdown="0"> >>> <transient_attributes id="e99889ee-da15-4b09-bfc7-641e3ac0687f"> >>> <instance_attributes >>> id="status-e99889ee-da15-4b09-bfc7-641e3ac0687f"> >>> <attributes> >>> <nvpair >>> id="status-e99889ee-da15-4b09-bfc7-641e3ac0687f-probe_complete" >>> name="probe_complete" value="true"/> >>> </attributes> >>> </instance_attributes> >>> </transient_attributes> >>> <lrm id="e99889ee-da15-4b09-bfc7-641e3ac0687f"> >>> <lrm_resources> >>> <lrm_resource id="IPaddr2_1" type="IPaddr2" class="ocf" >>> provider="heartbeat"> >>> <lrm_rsc_op id="IPaddr2_1_monitor_0" operation="monitor" >>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transition-k >>> ey="4:1:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:7;4:1:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="2" rc-code="7" op- >>> status="0" interval="0" last-run="1269914318" >>> last-rc-change="1269914318" exec-time="190" queue-time="10" >>> op-digest="e6e4647755681224d96a4ba7 >>> fc1a3391"/> >>> <lrm_rsc_op id="IPaddr2_1_start_0" operation="start" >>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transition-key=" >>> 4:3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;4:3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="5" rc-code="0" op-stat >>> us="0" interval="0" last-run="1269914319" last-rc-change="1269914319" >>> exec-time="110" queue-time="0" op-digest="e6e4647755681224d96a4ba7fc1a3 >>> 391"/> >>> <lrm_rsc_op id="IPaddr2_1_monitor_5000" operation="monitor" >>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transitio >>> n-key="5:3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;5:3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="6" rc-code="0" >>> op-status="0" interval="5000" last-run="1269914715" >>> last-rc-change="1269914319" exec-time="80" queue-time="0" >>> op-digest="8124f1b5e7c7c10bbbf3 >>> 82d3813c9b90"/> >>> <lrm_rsc_op id="IPaddr2_1_stop_0" operation="stop" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" >>> transition-key=" >>> 6:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;6:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="10" rc-code="0" op-sta >>> tus="0" interval="0" last-run="1269914720" last-rc-change="1269914720" >>> exec-time="60" queue-time="0" op-digest="e6e4647755681224d96a4ba7fc1a3 >>> 391"/> >>> </lrm_resource> >>> <lrm_resource id="mysqld_2" type="mysqld" class="lsb"> >>> <lrm_rsc_op id="mysqld_2_monitor_0" operation="monitor" >>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transition-ke >>> y="5:1:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;5:1:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="3" rc-code="0" op-s >>> tatus="0" interval="0" last-run="1269914319" last-rc-change="1269914319" >>> exec-time="0" queue-time="10" op-digest="f2317cad3d54cec5d7d7aa7d0bf >>> 35cf8"/> >>> <lrm_rsc_op id="mysqld_2_stop_0" operation="stop" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" >>> transition-key="1 >>> 0:5:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;10:5:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="9" rc-code="0" op-sta >>> tus="0" interval="0" last-run="1269914720" last-rc-change="1269914720" >>> exec-time="180" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf3 >>> 5cf8"/> >>> <lrm_rsc_op id="mysqld_2_start_0" operation="start" >>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" >>> transition-key="7 >>> :3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;7:3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="7" rc-code="0" op-statu >>> s="0" interval="0" last-run="1269914319" last-rc-change="1269914319" >>> exec-time="1130" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35 >>> cf8"/> >>> <lrm_rsc_op id="mysqld_2_monitor_120000" operation="monitor" >>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transiti >>> on-key="8:3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;8:3:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="8" rc-code="0" >>> op-status="0" interval="120000" last-run="1269914681" >>> last-rc-change="1269914321" exec-time="0" queue-time="0" >>> op-digest="873ed4f07792aa8ff1 >>> 8f3254244675ea"/> >>> </lrm_resource> >>> </lrm_resources> >>> </lrm> >>> </node_state> >>> <node_state id="db80324b-c9de-4995-a66a-eedf93abb42c" >>> uname="dbsuat1a.intranet.mydomain.com" ha="active" join="member" >>> crm-debug-origin=" >>> do_update_resource" crmd="online" shutdown="0" in_ccm="true" >>> expected="member"> >>> <lrm id="db80324b-c9de-4995-a66a-eedf93abb42c"> >>> <lrm_resources> >>> <lrm_resource id="mysqld_2" type="mysqld" class="lsb"> >>> <lrm_rsc_op id="mysqld_2_monitor_0" operation="monitor" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transition- >>> key="8:4:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;8:4:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="3" rc-code="0" op >>> -status="0" interval="0" last-run="1269914718" >>> last-rc-change="1269914718" exec-time="90" queue-time="0" >>> op-digest="f2317cad3d54cec5d7d7aa7d0 >>> bf35cf8"/> >>> <lrm_rsc_op id="mysqld_2_stop_0" operation="stop" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" >>> transition-key="1 >>> 1:5:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;11:5:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="4" rc-code="0" op-sta >>> tus="0" interval="0" last-run="1269914720" last-rc-change="1269914720" >>> exec-time="310" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf3 >>> 5cf8"/> >>> <lrm_rsc_op id="mysqld_2_start_0" operation="start" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" >>> transition-key= >>> "9:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;9:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="7" rc-code="0" op-sta >>> tus="0" interval="0" last-run="1269914723" last-rc-change="1269914723" >>> exec-time="220" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf3 >>> 5cf8"/> >>> <lrm_rsc_op id="mysqld_2_monitor_120000" operation="monitor" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transi >>> tion-key="10:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;10:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="8" rc-code >>> ="0" op-status="0" interval="120000" last-run="1269914725" >>> last-rc-change="1269914725" exec-time="0" queue-time="0" >>> op-digest="873ed4f07792aa >>> 8ff18f3254244675ea"/> >>> </lrm_resource> >>> <lrm_resource id="IPaddr2_1" type="IPaddr2" class="ocf" >>> provider="heartbeat"> >>> <lrm_rsc_op id="IPaddr2_1_monitor_0" operation="monitor" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transition >>> -key="7:4:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:7;7:4:7:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="2" rc-code="7" o >>> p-status="0" interval="0" last-run="1269914718" >>> last-rc-change="1269914718" exec-time="120" queue-time="0" >>> op-digest="e6e4647755681224d96a4ba >>> 7fc1a3391"/> >>> <lrm_rsc_op id="IPaddr2_1_start_0" operation="start" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-key >>> ="7:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;7:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="5" rc-code="0" op-st >>> atus="0" interval="0" last-run="1269914721" last-rc-change="1269914721" >>> exec-time="110" queue-time="0" op-digest="e6e4647755681224d96a4ba7fc1 >>> a3391"/> >>> <lrm_rsc_op id="IPaddr2_1_monitor_5000" operation="monitor" >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transit >>> ion-key="8:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> transition-magic="0:0;8:6:0:443f1faa-26f0-4013-95b1-d0a43e4b7f6a" >>> call-id="6" rc-code="0 >>> " op-status="0" interval="5000" last-run="1269914723" >>> last-rc-change="1269914723" exec-time="210" queue-time="0" >>> op-digest="8124f1b5e7c7c10bb >>> bf382d3813c9b90"/> >>> </lrm_resource> >>> </lrm_resources> >>> </lrm> >>> <transient_attributes id="db80324b-c9de-4995-a66a-eedf93abb42c"> >>> <instance_attributes >>> id="status-db80324b-c9de-4995-a66a-eedf93abb42c"> >>> <attributes> >>> <nvpair >>> id="status-db80324b-c9de-4995-a66a-eedf93abb42c-probe_complete" >>> name="probe_complete" value="true"/> >>> </attributes> >>> </instance_attributes> >>> </transient_attributes> >>> </node_state> >>> </status> >>> </cib> >>> >>> Florian Haas wrote: >>> >>> >>>> Mike, >>>> >>>> the information given reduces us to guesswork. >>>> >>>> - Messaging layer? >>>> - Pacemaker version? >>>> - Glue and agents versions? >>>> - crm configure show? >>>> - Logs? >>>> >>>> Cheers, >>>> Florian >>>> >>>> On 03/30/2010 03:48 AM, mike wrote: >>>> >>>> >>>> >>>>> So here's the situation: >>>>> >>>>> Node A (primary node) heartbeat up and running a VIP and mysqld >>>>> Node B (secondary node) up and running but heartbeat stopped >>>>> >>>>> I start heartbeat on Node B and expect it to come quickly, which it >>>>> does. I noticed in the logs on Node A that the cluster runs mysql start. >>>>> Why would it do this when mysql is already running there? Doesn't seem >>>>> to make sense to me. >>>>> >>>>> >>>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> Linux-HA@lists.linux-ha.org >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>>> >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >>> >>> >>> >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems