Hi Florian,

On 11-10-12 04:09 PM, Florian Haas wrote:
On 2011-10-12 21:46, Yves Trudeau wrote:
Hi Florian,
   sure, let me state the requirements.  If those requirements can be
met, pacemaker will be much more used to manage MySQL replication.
Right now, although at Percona I deal with many large MySQL deployments,
none are using the current agent.   Another tool, MMM is currently used
but it is currently orphan and suffers from many pretty fundamental
flaws (while implement about the same logic as below).

Consider a pool of N identical MySQL servers.  In that case we need:
- N replication resources (it could be the MySQL RA)
- N Reader_vip
- 1 Writer_vip

Reader vips are used by the application to run queries that do not
modify data, usually accessed is round-robin fashion.  When the
application needs to write something, it uses the writer_vip.  That's
how read/write splitting is implement in many many places.

So, for the agent, here are the requirements:

- No need to manage MySQL itself

The resource we are interested in is replication, MySQL itself is at
another level.  If the RA is to manage MySQL, it must not interfere.

- the writer_vip must be assigned only to the master, after it is promoted

This, is easy with colocation
Agreed.

- After the promotion of a new master, all slaves should be allowed to
complete the application of their relay logs prior to any change master

The current RA does not do that but it should be fairly easy to implement.
That's a use case for a pre-promote and post-promote notification. Like
the mysql RA currently does.

- After its promotion and before allowing writes to it, a master should
publish its current master file and position.   I am using resource
parameters in the CIB for these (I am wondering if transient attributes
could be used instead)
They could, and you should. Like the mysql RA currently does.


The RA I downloaded following instruction of the wiki stating it is the latest sources:

wget -O resource-agents.tar.bz2 http://hg.linux-ha.org/agents/archive/tip.tar.bz2

has the following code to change the master:

    ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
        -e "CHANGE MASTER TO MASTER_HOST='$master_host', \
                             MASTER_USER='$OCF_RESKEY_replication_user', \
MASTER_PASSWORD='$OCF_RESKEY_replication_passwd'"

which does not include file and position.


- After the promotion of a new master, all slaves should be reconfigured
to point to the new master host with correct file and position as
published by the master when it was promoted

The current RA does not set file and position.
"The current RA" being ocf:heartbeat:mysql?

A cursory grep for "CRM_ATTR" in ocf:heartbeat:mysql indicates that it
does set those.

grep CRM_ATTR returned nothing.

yves@yves-desktop:/opt/pacemaker/Cluster-Resource-Agents-7a11934b142d/heartbeat$ grep -i CRM_ATTR mysql
yves@yves-desktop:/opt/pacemaker/Cluster-Resource-Agents-7a11934b142d/heartbeat$

and that is the latest from Mercurial...

Under any non-trivial
load this will fail.  The current RA is not designed to stores the
information.  The new RA uses the information stored in the cib along
with post-promote notification.
Is this point moot considering my previous statement?

- each slave and the master may have one or more reader_vip provided
that they are replicating correctly (no lag beyond a threshold,
replication of course working).  If all slaves fails, all reader_vip
should be located on the master.
Use a cloned IPaddr2 as a non-anonymous clone, thereby managing an IP
range. Add a location constraint restricting the clone instance to run
on only those nodes where a specific node attribute is set. Or
conversely, forbid them from running on nodes where said attribute is
not set. Manage that attribute from your RA.

That's clever, never thought about it.

The current RA either kills MySQL or does nothing, it doesn't care about
reader_vips.  Killling MySQL on a busy server with 256GB of buffer pool
is enough for someone to lose his job...  The new RA adjusts location
scores for the reader_vip resources dynamically.
Like I said, that's managing one resource from another, which is a total
nightmare. It's also not necessary, I dare say, given the approach I
outlined above.

I'll explore the node attribute approach, I like it.

Is it possible to create an attribute that does not belong to a node but is cluster wide?
- the RA should implement a protection against flapping in case a slave
hovers around the replication lag threshold
You should get plenty of inspiration there from how the dampen parameter
is used in ocf:pacemaker:ping.

ok, I'll check
The current RA does implement that but it is not required giving the
context.  The new RA does implement flapping protection.

- upon demote of a master, the RA _must_ attempt to kill all user
(non-system) connections

The current RA does not do that but it is easy to implement
Yeah, as I assume it would be in the other one.

- Slaves must be read-only

That's fine, handled by the current RA.
Correct.

- Monitor should test MySQL and replication.  If either is bad, vips
should be moved away.  Common errors should not trigger actions.
Like I said, should be feasible with the node attribute approach
outlined above. No reason to muck around with the resources directly.

That's handled by the current RA for most of if.  The error handling
could be added.

- Slaves should update their master score according to the state of
their replication.

Handled by both RA
Right.

So, at the minimum, the RA needs to be able to store the master
coordinate information, either in the resource parameters or in
transient attributes and must be able to modify resources location
scores.  The script _was_ working before I got the cib issue, maybe it
was purely accidental but it proves the concept.  I was actually
implement/testing the relay_log completion stuff.  I chose not to use
the current agent because I didn't want to manage MySQL itself, just
replication.

I am wide open to argue any Pacemaker or RA architecture/design part but
I don't want to argue the replication requirements, they are fundamental
in my mind.
Yup, and I still believe that ocf:heartbeat:mysql either already
addresses those, or they could be addressed in a much cleaner fashion
than writing a new RA.

Now, if the only remaining point is "but I want to write an agent that
can do _less_ than an existing one" (namely, manage only replication,
not the underlying daemon), then I guess I can't argue with that, but
I'd still believe that would be a suboptimal approach.
Ohh... don't get me wrong, I am not the kind of guy that takes pride in having re-invented the flat tire. I want an opensource _solution_ I can offer to my customers. I think part of the problem here is that we are not talking about the same ocf:heartbeat:mysql RA. What is mainstream is what you can get with "apt-get install pacemaker" on 10.04 LTS for example. This is 1.0.8. I also tried 1.0.11 and still it is obviously not the same version. I got my "latest" agent version as explained in the clusterlabs FAQ page from:

wget -O resource-agents.tar.bz2 http://hg.linux-ha.org/agents/archive/tip.tar.bz2

Where can I get the version you are using :)

Regards,

Yves

Cheers,
Florian



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to