Matthew Booth <mbo...@redhat.com> wrote:
> A: start transaction; > A: insert into foo values(1) > A: commit; > B: select * from foo; <-- May not contain the value we inserted above[3] I’ve confirmed in my own testing that this is accurate. the wsrep_causal_reads flag does resolve this, and it is settable on a per-session basis. The attached script, adapted from the script given in the blog post, illustrates this. > > Galera exposes a session variable which will fix this: wsrep_sync_wait > (or wsrep_causal_reads on older mysql). However, this isn't the default. > It presumably has a performance cost, but I don't know what it is, or > how it scales with various workloads. Well, consider our application is doing some @writer, then later it does some @reader. @reader has the contract that reads must be synchronous with any writes. Easy enough, @reader ensures that the connection it uses sets up "set wsrep_causal_reads=1”. The attached test case confirms this is feasible on a per-session (that is, a connection attached to the database) basis, so that the setting will not impact the cluster as a whole, and we can forego using it on those @async_reader calls where we don’t need it. > Because these are semantic issues, they aren't things which can be > easily guarded with an if statement. We can't say: > > if galera: > try: > commit > except: > rewind time > > If we are to support this DB at all, we have to structure code in the > first place to allow for its semantics. I think the above example is referring to the “deadlock” issue, which we have solved both with the “only write to one master” strategy. But overall, as you’re aware, we will no longer have the words “begin” or “commit” in our code. This takes place all within enginefacade. With this pattern, we will permanently end the need for any kind of repeated special patterns or boilerplate which occurs per-transaction on a backend-configurable basis. The enginefacade is where any such special patterns can take place, and for extended patterns such as setting up wsrep_causal_reads on @reader nodes or similar, we can implement a rudimentary plugin system for it such that we can have a “galera” backend to set up what’s needed. The attached script does essentially what the one associated with http://www.percona.com/blog/2013/03/03/investigating-replication-latency-in-percona-xtradb-cluster/ does. It’s valid because without wsrep_causal_reads turned on the connection, I get plenty of reads that lag behind the writes, so I’ve confirmed this is easily reproducible, and that with casual_reads turned on, it vanishes. The script demonstrates that a single application can set up “wsrep_causal_reads” on a per-session basis (remember, by “session” we mean “a mysql session”), where it takes effect for that connection alone, not affecting the performance of other concurrent connections even in the same application. With the flag turned on, the script never reads a stale row. The script illustrates calls upon both the casual reads connection and the non-causal reads in a randomly alternating fashion. I’m running it against a cluster of two virtual nodes on a laptop, so performance is very slow, but some sample output: 2015-02-04 15:49:27,131 100 runs 2015-02-04 15:49:27,754 w/ non-causal reads, got row 763 val is 9499, retries 0 2015-02-04 15:49:27,760 w/ non-causal reads, got row 763 val is 9499, retries 1 2015-02-04 15:49:27,764 w/ non-causal reads, got row 763 val is 9499, retries 2 2015-02-04 15:49:27,772 w/ non-causal reads, got row 763 val is 9499, retries 3 2015-02-04 15:49:27,777 w/ non-causal reads, got row 763 val is 9499, retries 4 2015-02-04 15:49:30,985 200 runs 2015-02-04 15:49:37,579 300 runs 2015-02-04 15:49:42,396 400 runs 2015-02-04 15:49:48,240 w/ non-causal reads, got row 6544 val is 6766, retries 0 2015-02-04 15:49:48,255 w/ non-causal reads, got row 6544 val is 6766, retries 1 2015-02-04 15:49:48,276 w/ non-causal reads, got row 6544 val is 6766, retries 2 2015-02-04 15:49:49,336 500 runs 2015-02-04 15:49:56,433 600 runs 2015-02-04 15:50:05,801 700 runs 2015-02-04 15:50:08,802 w/ non-causal reads, got row 533 val is 834, retries 0 2015-02-04 15:50:10,849 800 runs 2015-02-04 15:50:14,834 900 runs 2015-02-04 15:50:15,445 w/ non-causal reads, got row 124 val is 3850, retries 0 2015-02-04 15:50:15,448 w/ non-causal reads, got row 124 val is 3850, retries 1 2015-02-04 15:50:18,515 1000 runs 2015-02-04 15:50:22,130 1100 runs 2015-02-04 15:50:26,301 1200 runs 2015-02-04 15:50:28,898 w/ non-causal reads, got row 1493 val is 8358, retries 0 2015-02-04 15:50:29,988 1300 runs 2015-02-04 15:50:33,736 1400 runs 2015-02-04 15:50:34,219 w/ non-causal reads, got row 9661 val is 2877, retries 0 2015-02-04 15:50:38,796 1500 runs 2015-02-04 15:50:42,844 1600 runs 2015-02-04 15:50:46,838 1700 runs 2015-02-04 15:50:51,049 1800 runs 2015-02-04 15:50:55,139 1900 runs 2015-02-04 15:50:59,632 2000 runs 2015-02-04 15:51:04,721 2100 runs 2015-02-04 15:51:10,670 2200 runs 2015-02-04 15:51:15,848 2300 runs 2015-02-04 15:51:20,960 2400 runs 2015-02-04 15:51:25,629 2500 runs 2015-02-04 15:51:30,747 2600 runs 2015-02-04 15:51:36,229 2700 runs 2015-02-04 15:51:39,865 w/ non-causal reads, got row 7378 val is 1571, retries 0 2015-02-04 15:51:39,869 w/ non-causal reads, got row 7378 val is 1571, retries 1 2015-02-04 15:51:39,874 w/ non-causal reads, got row 7378 val is 1571, retries 2 2015-02-04 15:51:39,880 w/ non-causal reads, got row 7378 val is 1571, retries 3 2015-02-04 15:51:39,887 w/ non-causal reads, got row 7378 val is 1571, retries 4 2015-02-04 15:51:39,892 w/ non-causal reads, got row 7378 val is 1571, retries 5 2015-02-04 15:51:40,640 2800 runs
from sqlalchemy import create_engine import random import itertools import logging logging.basicConfig(format='%(asctime)-15s %(message)s') log = logging.getLogger(__name__) log.setLevel(logging.INFO) e1 = create_engine("mysql://root:root@rhel7-1/test") e2 = create_engine("mysql://root:root@rhel7-2/test") c1 = e1.connect() if e1.has_table('sbtest'): c1.execute("drop table sbtest") c1.execute( "create table sbtest(id integer primary key, k integer)") c1.execute("delete from sbtest") c1.execute( "insert into sbtest (id, k) values (%s, %s)", [(i, random.randint(1, 1000)) for i in range(1000)] ) c2_causal_reads = e2.connect() c2_causal_reads.execute("set wsrep_causal_reads=1") c2_no_causal_reads = e2.connect() # assert that the two connections have independent settings # for casual reads assert c2_causal_reads.execute( "show variables like 'wsrep_causal_reads'").first()[1] == 'ON' assert c2_no_causal_reads.execute( "show variables like 'wsrep_causal_reads'").first()[1] == 'OFF' for run in itertools.count(): val = random.randint(1, 10000) i = random.randint(1, 999) c1.execute("update sbtest set k=%s where id=%s", (val, i)) if random.randint(1, 2) == 1: # c2_causal_reads should always read the correct value c2_causal_reads.connection.rollback() result = c2_causal_reads.execute( "select k from sbtest where id=%s", (i,)) assert result.scalar() == val, \ "Got wrong value w/ causal reads session" else: # we expect c2_non_causal_reads to fail sometimes for retry in itertools.count(): c2_no_causal_reads.connection.rollback() result = c2_no_causal_reads.execute( "select k from sbtest where id=%s", (i,)) recv = result.scalar() if recv == val: break log.info( "w/ non-causal reads, got row %s val is %s, retries %s", recv, val, retry) if run % 100 == 0: log.info("%s runs", run)
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev