Re: [Linux-HA] Master/Slave - Master node not monitored after a failure

David Vossel Mon, 04 Feb 2013 08:43:47 -0800


----- Original Message -----
> From: "radurad" <[email protected]>
> To: [email protected]
> Sent: Monday, February 4, 2013 1:51:49 AM
> Subject: Re: [Linux-HA] Master/Slave - Master node not monitored after a 
> failure
> 
> 
> Hi,
> 
> I've installed from rpm's as it was faster (from sources I had to
> install a
> lot in devel packages and got stuck at libcpg).
> The issues is solved, master is being monitored after any numbers of
> failures. But, there is a new issue I'm facing now (if I'm not able
> to have
> it fixed I'll probably make a new post on forum - if one is not
> already
> created-): after a couple of failures and restarts at the next
> failure the
> mysql is not started anymore; on logs i got the message "MySql is not
> running", but the start/ restart doesn't happen (made sure that
> failcount is
> 0, as I have it reseted from time to time).


I haven't encountered anything like that. If you can gather the log and pengine 
cluster data using crm_report we should be able to help figure out what is 
going on.

-- Vossel

> 
> Thanks again,
> Radu Rad.
> 
> 
> David Vossel wrote:
> > 
> > 
> > 
> > ----- Original Message -----
> >> From: "radurad" <[email protected]>
> >> To: [email protected]
> >> Sent: Wednesday, January 30, 2013 5:10:00 AM
> >> Subject: Re: [Linux-HA] Master/Slave - Master node not monitored
> >> after a
> >> failure
> >> 
> >> 
> >> Hi,
> >> 
> >> Thank you for clarifying this.
> >> On CentOS 6 the latest pacemaker build is 1.1.7 (which i'm using
> >> now), do
> >> you see a problem if I'm installing from sources so that I'll have
> >> the 1.1.8
> >> pacemaker version?
> > 
> > The only thing I can think of is that you might have to get a new
> > version
> > of libqb in order to use 1.1.8.  We already have a rhel 6 based
> > package
> > you can use if you want.
> > 
> > http://clusterlabs.org/rpm-next/
> > 
> > -- Vossel
> > 
> >> Best Regards,
> >> Radu Rad.
> >> 
> >> 
> >> 
> >> David Vossel wrote:
> >> > 
> >> > 
> >> > 
> >> > ----- Original Message -----
> >> >> From: "radurad" <[email protected]>
> >> >> To: [email protected]
> >> >> Sent: Thursday, January 24, 2013 6:07:38 AM
> >> >> Subject: [Linux-HA] Master/Slave - Master node not monitored
> >> >> after
> >> >> a
> >> >> failure
> >> >> 
> >> >> 
> >> >> Hi,
> >> >> 
> >> >> Using following installation under CentOS
> >> >> 
> >> >> corosync-1.4.1-7.el6_3.1.x86_64
> >> >> resource-agents-3.9.2-12.el6.x86_64
> >> >> 
> >> >> and having the following configuration for a Master/Slave mysql
> >> >> 
> >> >> primitive mysqld ocf:heartbeat:mysql \
> >> >>         params binary="/usr/bin/mysqld_safe"
> >> >>         config="/etc/my.cnf"
> >> >> socket="/var/lib/mysql/mysql.sock" datadir="/var/lib/mysql"
> >> >> user="mysql"
> >> >> replication_user="root" replication_passwd="testtest" \
> >> >>         op monitor interval="5s" role="Slave" timeout="31s" \
> >> >>         op monitor interval="6s" role="Master" timeout="30s"
> >> >> ms ms_mysql mysqld \
> >> >>         meta master-max="1" master-node-max="1" clone-max="2"
> >> >> clone-node-max="1" notify="true"
> >> >> property $id="cib-bootstrap-options" \
> >> >>        
> >> dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14"
> >> >>         \
> >> >>         cluster-infrastructure="openais" \
> >> >>         expected-quorum-votes="2" \
> >> >>         no-quorum-policy="ignore" \
> >> >>         stonith-enabled="false" \
> >> >>         last-lrm-refresh="1359026356" \
> >> >>         start-failure-is-fatal="false" \
> >> >>         cluster-recheck-interval="60s"
> >> >> rsc_defaults $id="rsc-options" \
> >> >>         failure-timeout="50s"
> >> >> 
> >> >> Having only one node online (the Master; with a slave online
> >> >> the
> >> >> problem
> >> >> also occurs, but for simplification I've left only the Master
> >> >> online)
> >> >> 
> >> >> I run into the bellow problem:
> >> >> - Stopping once the mysql process results in corosync
> >> >> restarting
> >> >> the
> >> >> mysql
> >> >> again and promoting it to Master.
> >> >> - Stopping again the mysql process results in nothing; the
> >> >> failure
> >> >> is
> >> >> not
> >> >> detected, corosync takes no action and still sees the node as
> >> >> Master
> >> >> and the
> >> >> mysql running.
> >> >> - The operation monitor is not running after the first failure,
> >> >> as
> >> >> there are
> >> >> not entries in log of type:  INFO: MySQL monitor succeeded
> >> >> (master).
> >> >> - Changing something in configuration results in corosync
> >> >> detecting
> >> >> immediately that mysql is not running and promotes it. Also the
> >> >> operation
> >> >> monitor will run until the first failure and which the same
> >> >> problem
> >> >> occurs.
> >> >> 
> >> >> If you need more information let me know. I could attach the
> >> >> log
> >> >> in
> >> >> the
> >> >> messages files also.
> >> > 
> >> > Hey,
> >> > 
> >> > This is a known bug and has been resolved in pacemaker 1.1.8.
> >> > 
> >> > Here's the related issue. The commits are listed in the
> >> > comments.
> >> > http://bugs.clusterlabs.org/show_bug.cgi?id=5072
> >> > 
> >> > 
> >> > -- Vossel
> >> > 
> >> >> Thanks for now,
> >> >> Radu.
> >> >> 
> >> >> --
> >> >> View this message in context:
> >> >>
> >> http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34939865.html
> >> >> Sent from the Linux-HA mailing list archive at Nabble.com.
> >> >> 
> >> >> _______________________________________________
> >> >> Linux-HA mailing list
> >> >> [email protected]
> >> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> >> See also: http://linux-ha.org/ReportingProblems
> >> >> 
> >> > _______________________________________________
> >> > Linux-HA mailing list
> >> > [email protected]
> >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > See also: http://linux-ha.org/ReportingProblems
> >> > 
> >> > 
> >> 
> >> --
> >> View this message in context:
> >> http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34962132.html
> >> Sent from the Linux-HA mailing list archive at Nabble.com.
> >> 
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >> 
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> > 
> 
> --
> View this message in context:
> http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34979148.html
> Sent from the Linux-HA mailing list archive at Nabble.com.
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Master/Slave - Master node not monitored after a failure

Reply via email to