Hi Bogdan, Thank you for sharing this! I'll need to familiarize myself with this Jepsen thing, but overall it looks interesting.
As it turns out, we already run Galera in multi-writer mode in Fuel unintentionally in the case, when the active MySQL node goes down, HAProxy starts opening connections to a backup, then the active goes up again, HAProxy starts opening connections to the original MySQL node, but OpenStack services may still have connections opened to the backup in their connection pools - so now you may have connections to multiple MySQL nodes at the same time, exactly what you wanted to avoid by using active/backup in the HAProxy configuration. ^ this actually leads to an interesting issue [1], when the DB state committed on one node is not immediately available on another one. Replication lag can be controlled via session variables [2], but that does not always help: e.g. in [1] Nova first goes to Neutron to create a new floating IP, gets 201 (and Neutron actually *commits* the DB transaction) and then makes another REST API request to get a list of floating IPs by address - the latter can be served by another neutron-server, connected to another Galera node, which does not have the latest state applied yet due to 'slave lag' - it can happen that the list will be empty. Unfortunately, 'wsrep_sync_wait' can't help here, as it's two different REST API requests, potentially served by two different neutron-server instances. Basically, you'd need to *always* wait for the latest state to be applied before executing any queries, which Galera is trying to avoid for performance reasons. Thanks, Roman [1] https://bugs.launchpad.net/fuel/+bug/1529937 [2] http://galeracluster.com/2015/06/achieving-read-after-write-semantics-with-galera/ On Fri, Apr 22, 2016 at 10:42 AM, Bogdan Dobrelya <bdobre...@mirantis.com> wrote: > [crossposting to openstack-operat...@lists.openstack.org] > > Hello. > I wrote this paper [0] to demonstrate an approach how we can leverage a > Jepsen framework for QA/CI/CD pipeline for OpenStack projects like Oslo > (DB) or Trove, Tooz DLM and perhaps for any integration projects which > rely on distributed systems. Although all tests are yet to be finished, > results are quite visible, so I better off share early for a review, > discussion and comments. > > I have similar tests done for the RabbitMQ OCF RA clusterers as well, > although have yet wrote a report. > > PS. I'm sorry for so many tags I placed in the topic header, should I've > used just "all" :) ? Have a nice weekends and take care! > > [0] https://goo.gl/VHyIIE > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev