On Fri, Sep 3, 2010 at 12:12 PM, Bernd Schubert <bs_li...@aakef.fastmail.fm> wrote: > On Friday, September 03, 2010, Lars Ellenberg wrote: >> > > how about an fping RA ? >> > > active=$(fping -a -i 5 -t 250 -B1 -r1 $host_list 2>/dev/null | wc -l) >> > > >> > > terminates in about 3 seconds for a hostlist of 100 (on the LAN, 29 of >> > > which are alive). >> > >> > Happy to add if someone writes it :-) >> >> I thought so ;-) >> Additional note to whomever is going to: >> >> With fping you can get fancy about "better connectivity", >> you are not limited to the measure "number of nodes responding". > > I think for the beginning, just the basic feature should be sufficient. > Actually I thought about to add an option to the existing ping RA to let the > user choose between ping and fping, it would default to ping. I will do that > mid of next week.
Could you make fping the default binary if its installed and send us the patch when you\'re done? :-) > > >> You could also use the statistics on packet loss and rtt provided on >> stderr for -c or -C mode (example output below, chose what you think is >> easier to parse), then do some scoring scheme on average or max packet >> loss, rtt, or whatever else makes sense to you. >> (If a switch starts dying, it may produce increasing packet loss first...) > > That will require quite parsing, which I'm not comfortable with in a shell > script. I have no objections to later on add fping RA written in perl or > python. > > [...] > >> >> > >> PS: (*) As you insist ;) on quorum with n/2 + 1 nodes, we use ping as >> > >> replacement. We simply cannot fulfill n/2 + 1, as controller failure >> > >> takes down 50% of the systems (virtual machines) and the systems >> > >> (VMs) of the 2nd controller are then supposed to take over failed >> > >> services. I see that n/2 + 1 is optimal and also required for a few >> > >> nodes. But if you have a larger set of system (e.g. minimum 6 with >> > >> the VM systems I have in my mind) n/2 + 1 is sufficient, IMHO. >> > > >> > > You meant to say you consider == n/2 sufficient, instead of > n/2 ? >> >> So you have a two node virtualization stuff, each hosting n/2 VMs, >> and do the pacemaker clustering between those VMs? > > Yes. > >> >> I'm sure you could easily add "somewhere else" a very bare bone VM >> (or real) server, that is dedicated member of your cluster, but >> never takes any resources? Just serves as arbitrator? as your "+1"? > > No, I'm afraid it is not that easy. There is simply is nothing that can be > used. If there is anything, it is always available on both hosts/controllers. > Imagine you would sell a standalone DRBD system (black box), that provides for > example NFS to clients. You would want to have each and every additional > service mirrored again. And you could not rely on additional customer NFS > clients. > >> >> May be easier, safer, and more transparent than >> no-quorum=ignore plus some ping attribute based auto-shutdown. > > I agree on safer and transparent, but unfortunately, it not easier in our > case. > > -- > Bernd Schubert > DataDirect Networks > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker