On 2011-10-12 21:46, Yves Trudeau wrote: > Hi Florian, > sure, let me state the requirements. If those requirements can be > met, pacemaker will be much more used to manage MySQL replication. > Right now, although at Percona I deal with many large MySQL deployments, > none are using the current agent. Another tool, MMM is currently used > but it is currently orphan and suffers from many pretty fundamental > flaws (while implement about the same logic as below). > > Consider a pool of N identical MySQL servers. In that case we need: > - N replication resources (it could be the MySQL RA) > - N Reader_vip > - 1 Writer_vip > > Reader vips are used by the application to run queries that do not > modify data, usually accessed is round-robin fashion. When the > application needs to write something, it uses the writer_vip. That's > how read/write splitting is implement in many many places. > > So, for the agent, here are the requirements: > > - No need to manage MySQL itself > > The resource we are interested in is replication, MySQL itself is at > another level. If the RA is to manage MySQL, it must not interfere. > > - the writer_vip must be assigned only to the master, after it is promoted > > This, is easy with colocation
Agreed. > > - After the promotion of a new master, all slaves should be allowed to > complete the application of their relay logs prior to any change master > > The current RA does not do that but it should be fairly easy to implement. That's a use case for a pre-promote and post-promote notification. Like the mysql RA currently does. > > - After its promotion and before allowing writes to it, a master should > publish its current master file and position. I am using resource > parameters in the CIB for these (I am wondering if transient attributes > could be used instead) They could, and you should. Like the mysql RA currently does. > - After the promotion of a new master, all slaves should be reconfigured > to point to the new master host with correct file and position as > published by the master when it was promoted > > The current RA does not set file and position. "The current RA" being ocf:heartbeat:mysql? A cursory grep for "CRM_ATTR" in ocf:heartbeat:mysql indicates that it does set those. > Under any non-trivial > load this will fail. The current RA is not designed to stores the > information. The new RA uses the information stored in the cib along > with post-promote notification. Is this point moot considering my previous statement? > - each slave and the master may have one or more reader_vip provided > that they are replicating correctly (no lag beyond a threshold, > replication of course working). If all slaves fails, all reader_vip > should be located on the master. Use a cloned IPaddr2 as a non-anonymous clone, thereby managing an IP range. Add a location constraint restricting the clone instance to run on only those nodes where a specific node attribute is set. Or conversely, forbid them from running on nodes where said attribute is not set. Manage that attribute from your RA. > The current RA either kills MySQL or does nothing, it doesn't care about > reader_vips. Killling MySQL on a busy server with 256GB of buffer pool > is enough for someone to lose his job... The new RA adjusts location > scores for the reader_vip resources dynamically. Like I said, that's managing one resource from another, which is a total nightmare. It's also not necessary, I dare say, given the approach I outlined above. > - the RA should implement a protection against flapping in case a slave > hovers around the replication lag threshold You should get plenty of inspiration there from how the dampen parameter is used in ocf:pacemaker:ping. > The current RA does implement that but it is not required giving the > context. The new RA does implement flapping protection. > > - upon demote of a master, the RA _must_ attempt to kill all user > (non-system) connections > > The current RA does not do that but it is easy to implement Yeah, as I assume it would be in the other one. > - Slaves must be read-only > > That's fine, handled by the current RA. Correct. > - Monitor should test MySQL and replication. If either is bad, vips > should be moved away. Common errors should not trigger actions. Like I said, should be feasible with the node attribute approach outlined above. No reason to muck around with the resources directly. > That's handled by the current RA for most of if. The error handling > could be added. > > - Slaves should update their master score according to the state of > their replication. > > Handled by both RA Right. > So, at the minimum, the RA needs to be able to store the master > coordinate information, either in the resource parameters or in > transient attributes and must be able to modify resources location > scores. The script _was_ working before I got the cib issue, maybe it > was purely accidental but it proves the concept. I was actually > implement/testing the relay_log completion stuff. I chose not to use > the current agent because I didn't want to manage MySQL itself, just > replication. > > I am wide open to argue any Pacemaker or RA architecture/design part but > I don't want to argue the replication requirements, they are fundamental > in my mind. Yup, and I still believe that ocf:heartbeat:mysql either already addresses those, or they could be addressed in a much cleaner fashion than writing a new RA. Now, if the only remaining point is "but I want to write an agent that can do _less_ than an existing one" (namely, manage only replication, not the underlying daemon), then I guess I can't argue with that, but I'd still believe that would be a suboptimal approach. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker