Hi Benoit, indeed, a slow node can impact the rest of the cluster, that's why, like Jamie pointed out, DNS round robin is not a viable method to distribute load across a Galera cluster. Several solutions exist: - HAProxy with Galera checkscript - our own MariaDB MaxScale which includes a Galera Monitor - glbd (small load balancing daemon which comes with Galera)
Regards, On Mon, Dec 14, 2015 at 10:18 AM Jamie Gibbard <jamie.gibb...@netnames.com> wrote: > You should consider using a better method for connecting to your DB > servers, than DNS round robin. > > Think about using a haproxy load balancing node, with the clustercheck > script (https://github.com/olafz/percona-clustercheck) > > This would ensure that a node is not only accessible on its MySQL port, > but ready for action! > > > > > -----Original Message----- > From: Maria-discuss [mailto:maria-discuss-bounces+jamie.gibbard= > netnames....@lists.launchpad.net] On Behalf Of Benoit Panizzon > Sent: 14 December 2015 08:31 > To: MariaDB discuss > Subject: [Maria-discuss] Galera Cluster: Cluster Blocked, when one node > down? > > Hello > > We use MariaDB Galera Cluster for our email service platform. > > We decided to use Galera to create a high availability platform. > > After a year of operation, we start to relaize, that somehow Galera > Failures seem to be the most common cause for outages we had in the past. > > So I wonder if others operating galera clusters also observe this > situation: > > All our services using DB connections use a DNS round-robin name, to > connect to one of our three galera instances. > > While testing this setup, we usualy killed one instance, or disconnected > the node from the network to simulate an outage. In this situation, this > works as expected. The client connect to the two remaining nodes, no > service outage. > > When the node is re-started it is being re-synced quickly and service with > three nodes is restored. > > Now we experienced a few galera cluster fails, which seem to happen this > way: > One of the nodes is getting a lot of load. DDOS Attacks, Memory Leaks or > similar, which just renders the whole physical machine laggy for a short > time. So the affected MariaDB node is being thrown out of the cluster by > the two other nodes, probably for not syncing fast enough anymore. > > But as the node is not 'down' completely, it still accepts connections > from the DB clients, but does not reply to them and seems to remain in a > 'db locked' situation. Strangely this then also affects the two remaining > nodes, who also go into 'locked' mode and do not reply to queries on the > time expected by the application anymore. Of course this then causes more > DB clients (IMAP, SMTP-Auth, etc) to spawn and to create DB connections > worsening the whole situation. > > The situation seemingly can only be resolved by shuting down the MariaDB > node that got thrown out of the cluster. Then the situations normalizes > with the two remaining nodes and the third one can be restarted. > > Is this expected behaviour? Is there a way to tell a MariaDB node that got > excluded from the cluster to shut himself down completely so it does NOT > accept any more connections from clients, blocking the whole service? > > Regards > > -BenoƮt Panizzon- > -- > I m p r o W a r e A G - Leiter Commerce Kunden > ______________________________________________________ > > Zurlindenstrasse 29 Tel +41 61 826 93 00 > CH-4133 Pratteln Fax +41 61 826 93 01 > Schweiz Web http://www.imp.ch > ______________________________________________________ > > _______________________________________________ > Mailing list: https://launchpad.net/~maria-discuss > Post to : maria-discuss@lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-discuss > More help : https://help.launchpad.net/ListHelp > NetNames, 25 Canada Square, Canary Wharf, London E14 5LQ, UK | Tel: +44 > 207 015 9200 | NetNames Limited, Registered in England and Wales, Company > number: 3169594, VAT Number: GB 739633893 > _______________________________________________ > Mailing list: https://launchpad.net/~maria-discuss > Post to : maria-discuss@lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-discuss > More help : https://help.launchpad.net/ListHelp > -- Guillaume Lefranc Remote DBA Services Manager MariaDB Corporation
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp