Re: [Pacemaker] crm_master triggering assert section != NULL

Yves Trudeau Wed, 12 Oct 2011 12:51:06 -0700

Hi Florian,

sure, let me state the requirements. If those requirements can bemet, pacemaker will be much more used to manage MySQL replication.Right now, although at Percona I deal with many large MySQL deployments,none are using the current agent. Another tool, MMM is currently usedbut it is currently orphan and suffers from many pretty fundamentalflaws (while implement about the same logic as below).


Consider a pool of N identical MySQL servers.  In that case we need:
- N replication resources (it could be the MySQL RA)
- N Reader_vip
- 1 Writer_vip

Reader vips are used by the application to run queries that do notmodify data, usually accessed is round-robin fashion. When theapplication needs to write something, it uses the writer_vip. That'show read/write splitting is implement in many many places.


So, for the agent, here are the requirements:

- No need to manage MySQL itself

The resource we are interested in is replication, MySQL itself is atanother level. If the RA is to manage MySQL, it must not interfere.


- the writer_vip must be assigned only to the master, after it is promoted

This, is easy with colocation

- After the promotion of a new master, all slaves should be allowed tocomplete the application of their relay logs prior to any change master


The current RA does not do that but it should be fairly easy to implement.

- After its promotion and before allowing writes to it, a master shouldpublish its current master file and position. I am using resourceparameters in the CIB for these (I am wondering if transient attributescould be used instead)

- After the promotion of a new master, all slaves should be reconfiguredto point to the new master host with correct file and position aspublished by the master when it was promoted

The current RA does not set file and position. Under any non-trivialload this will fail. The current RA is not designed to stores theinformation. The new RA uses the information stored in the cib alongwith post-promote notification.

- each slave and the master may have one or more reader_vip providedthat they are replicating correctly (no lag beyond a threshold,replication of course working). If all slaves fails, all reader_vipshould be located on the master.

The current RA either kills MySQL or does nothing, it doesn't care aboutreader_vips. Killling MySQL on a busy server with 256GB of buffer poolis enough for someone to lose his job... The new RA adjusts locationscores for the reader_vip resources dynamically.

- the RA should implement a protection against flapping in case a slavehovers around the replication lag threshold

The current RA does implement that but it is not required giving thecontext. The new RA does implement flapping protection.

- upon demote of a master, the RA _must_ attempt to kill all user(non-system) connections


The current RA does not do that but it is easy to implement

- Slaves must be read-only

That's fine, handled by the current RA.

- Monitor should test MySQL and replication. If either is bad, vipsshould be moved away. Common errors should not trigger actions.

That's handled by the current RA for most of if. The error handlingcould be added.

- Slaves should update their master score according to the state oftheir replication.


Handled by both RA

So, at the minimum, the RA needs to be able to store the mastercoordinate information, either in the resource parameters or intransient attributes and must be able to modify resources locationscores. The script _was_ working before I got the cib issue, maybe itwas purely accidental but it proves the concept. I was actuallyimplement/testing the relay_log completion stuff. I chose not to usethe current agent because I didn't want to manage MySQL itself, justreplication.

I am wide open to argue any Pacemaker or RA architecture/design part butI don't want to argue the replication requirements, they are fundamentalin my mind.


Do not hesitate if you have questions.

Regards,

Yves




On 11-10-12 01:53 PM, Florian Haas wrote:

On 2011-10-12 19:36, Yves Trudeau wrote:

Hi Florian,
   I pushed the latest code to LP, the agent use notification now.

Better.

Also,
most of the start/stop of resource have been removed.

"Most of" is really not good enough here -- that thing still does all
sorts of things modifying other resources, and I think we all agree that
that's a big no-no. The monitor function is also still misguided.

In my opinion,
the existing agent would need a major rewrite to support the required
logic.

I don't recall this RA being discussed on this list prior to today, or
any of the authors getting involved in a discussion on the existing
mysql RA. I may have missed something though; did I? If so, please point
me to a link from the list archives and I'll be happy to educate myself
on the discussion and whatever pros and cons were raised therein.

I think indeed it will a good idea to sit and talk at PLUK
about it.

Yes, let's do that.

  Maybe Pacemaker cannot be used but that would be sad.

I strongly doubt that it can't.

Cheers,
Florian



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] crm_master triggering assert section != NULL

Reply via email to