Hi Florian,
sure, let me state the requirements. If those requirements can be met, pacemaker will be much more used to manage MySQL replication. Right now, although at Percona I deal with many large MySQL deployments, none are using the current agent. Another tool, MMM is currently used but it is currently orphan and suffers from many pretty fundamental flaws (while implement about the same logic as below).

Consider a pool of N identical MySQL servers.  In that case we need:
- N replication resources (it could be the MySQL RA)
- N Reader_vip
- 1 Writer_vip

Reader vips are used by the application to run queries that do not modify data, usually accessed is round-robin fashion. When the application needs to write something, it uses the writer_vip. That's how read/write splitting is implement in many many places.

So, for the agent, here are the requirements:

- No need to manage MySQL itself

The resource we are interested in is replication, MySQL itself is at another level. If the RA is to manage MySQL, it must not interfere.

- the writer_vip must be assigned only to the master, after it is promoted

This, is easy with colocation

- After the promotion of a new master, all slaves should be allowed to complete the application of their relay logs prior to any change master

The current RA does not do that but it should be fairly easy to implement.

- After its promotion and before allowing writes to it, a master should publish its current master file and position. I am using resource parameters in the CIB for these (I am wondering if transient attributes could be used instead)

- After the promotion of a new master, all slaves should be reconfigured to point to the new master host with correct file and position as published by the master when it was promoted

The current RA does not set file and position. Under any non-trivial load this will fail. The current RA is not designed to stores the information. The new RA uses the information stored in the cib along with post-promote notification.

- each slave and the master may have one or more reader_vip provided that they are replicating correctly (no lag beyond a threshold, replication of course working). If all slaves fails, all reader_vip should be located on the master.

The current RA either kills MySQL or does nothing, it doesn't care about reader_vips. Killling MySQL on a busy server with 256GB of buffer pool is enough for someone to lose his job... The new RA adjusts location scores for the reader_vip resources dynamically.

- the RA should implement a protection against flapping in case a slave hovers around the replication lag threshold

The current RA does implement that but it is not required giving the context. The new RA does implement flapping protection.

- upon demote of a master, the RA _must_ attempt to kill all user (non-system) connections

The current RA does not do that but it is easy to implement

- Slaves must be read-only

That's fine, handled by the current RA.

- Monitor should test MySQL and replication. If either is bad, vips should be moved away. Common errors should not trigger actions.

That's handled by the current RA for most of if. The error handling could be added.

- Slaves should update their master score according to the state of their replication.

Handled by both RA


So, at the minimum, the RA needs to be able to store the master coordinate information, either in the resource parameters or in transient attributes and must be able to modify resources location scores. The script _was_ working before I got the cib issue, maybe it was purely accidental but it proves the concept. I was actually implement/testing the relay_log completion stuff. I chose not to use the current agent because I didn't want to manage MySQL itself, just replication.

I am wide open to argue any Pacemaker or RA architecture/design part but I don't want to argue the replication requirements, they are fundamental in my mind.

Do not hesitate if you have questions.

Regards,

Yves




On 11-10-12 01:53 PM, Florian Haas wrote:
On 2011-10-12 19:36, Yves Trudeau wrote:
Hi Florian,
   I pushed the latest code to LP, the agent use notification now.
Better.

Also,
most of the start/stop of resource have been removed.
"Most of" is really not good enough here -- that thing still does all
sorts of things modifying other resources, and I think we all agree that
that's a big no-no. The monitor function is also still misguided.

In my opinion,
the existing agent would need a major rewrite to support the required
logic.
I don't recall this RA being discussed on this list prior to today, or
any of the authors getting involved in a discussion on the existing
mysql RA. I may have missed something though; did I? If so, please point
me to a link from the list archives and I'll be happy to educate myself
on the discussion and whatever pros and cons were raised therein.

I think indeed it will a good idea to sit and talk at PLUK
about it.
Yes, let's do that.

  Maybe Pacemaker cannot be used but that would be sad.
I strongly doubt that it can't.

Cheers,
Florian



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to