All the status variables look sane. But you've uploaded empty binlogs, there's no transaction that is hung in them. So I'd guess it should be in mariadb-bin.000004 (or an earlier file) on master and either it doesn't exist on slave or it's in mariadb-bin.000001 and relay-bin.000004 (or in earlier relay-bin.* file).
On Fri, Jul 29, 2016 at 1:29 AM, Joseph Glanville <j...@jpg.id.au> wrote: > Hi Justin, > > > Adjusting the timeout doesn't seem to have any effect. Though setting it > low enough does cause the master to time out waiting for the slave to > acknowledge the write and falls back to async only to immediately > re-establish semi-sync replication. It does this every time the master > begins writing to a new binlog. > > > Joseph. > ------------------------------ > *From:* Justin Swanhart <greenl...@gmail.com> > *Sent:* Friday, 29 July 2016 6:17:28 PM > *To:* Joseph Glanville > *Cc:* Pavel Ivanov; maria-discuss@lists.launchpad.net > > *Subject:* Re: [Maria-discuss] Semi-sync replication hangs when changing > binlog filename. > > Hi, > > Does the problem appear if you set the timeout value to > 9223372036854775807? > > > On Fri, Jul 29, 2016 at 3:24 AM, Joseph Glanville <j...@jpg.id.au> wrote: > >> Hi Pavel. >> >> >> To describe the setup a little better the master replicates to a >> semi-sync slave, which then replicates to an async slave. This is to ensure >> at any point in time both the master and the semi-sync slave have a >> complete copy of the data. If the master fails the semi-sync is >> automatically promoted to master and the async switches to replicating with >> semi-sync replication. If the semi-sync fails then the async remasters >> itself to the master and switches to semi-sync. >> >> >> However I don't think the 3rd node has any bearing on the hang, I built a >> test cluster without it and the hang is still easy to reproduce. I just >> restore a decent sized dump, in this case a portion of the Wikipedia >> database and the cluster reliably hangs when the master begins writing to >> the new binlog. >> >> The dump is here if someone wants to use it to reproduce: >> https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz >> >> >> I have created a gist with the output of `SHOW STATUS LIKE >> 'Rpl_semi_sync%s'` on both master and slave of the simplified 2 node setup. >> I have also included the binlogs of both the master and the slave and the >> relay log on the slave. >> >> https://gist.github.com/josephglanville/70789bc9c3744090a17070652cded68b >> >> >> <https://gist.github.com/josephglanville/70789bc9c3744090a17070652cded68b>Let >> me know if there is any other useful information I can provide. >> >> >> Joseph. >> ------------------------------ >> *From:* Pavel Ivanov <piva...@google.com> >> *Sent:* Friday, 29 July 2016 4:31:26 PM >> *To:* Joseph Glanville >> *Cc:* Will Fong; maria-discuss@lists.launchpad.net >> *Subject:* Re: [Maria-discuss] Semi-sync replication hangs when changing >> binlog filename. >> >> This looks pretty weird. If you don't mind more information would be >> useful to look at: contents of mariadb-bin.000005 on the master, in >> particular what GTID and binlog position the transaction waiting for >> semi-sync ack has (confirm that it's 0-1684280839-156 and ends at offset >> 329); result of "show status like 'rpl_semi_sync_%'" on both master and >> slave; contents of relay-bin.000005 and binlog on the slave, in particular >> did it really execute the transaction that is currently hanging on the >> master? Out of curiosity: it looks like the slave also acts as a master to >> someone else. Can you also verify that the transaction hanging now on the >> master made it to that second-level slave? >> >> But to be honest, I don't quite understand how what you show us could >> happen, so I'm just asking to look at the info that I would look at if I >> were investigating such problem. >> >> On Thu, Jul 28, 2016 at 10:52 PM, Joseph Glanville <j...@jpg.id.au> wrote: >> >>> Hi Pavel. >>> >>> Yes, by “binlog filename changes” I mean the master begins writing to a >>> new binlog file. >>> >>> Output of all the requested commands are in this gist: >>> https://gist.github.com/josephglanville/7b96c34bb6e79ace33e56627672b98a5 >>> >>> Joseph Glanville >>> Sent from Polymail >>> <https://polymail.io/?utm_source=polymail&utm_medium=referral&utm_campaign=signature> >>> >>> >>> On Fri, 29 Jul 2016 at 3:08 PM Pavel Ivanov <Pavel Ivanov >>> <pavel+ivanov+%3cpiva...@google.com%3E>> wrote: >>> >>>> By "binlog filename changes" you mean when master starts writing >>>> binlogs into a new file? Can you clarify how the replication stalls? What >>>> "show processlist" shows at that time on master and on slave? What does >>>> "show slave status" show on the slave? On Thu, Jul 28, 2016 at 10:03 PM, >>>> Will Fong wrote: > Hi Joseph, > > On Fri, Jul 29, 2016 at 10:11 AM, >>>> Joseph Glanville wrote: >> However whenever the binlog filename >>>> changes the replication stalls >> indefinitely. > > Interesting! I may have >>>> reproduced this, but it was only a quick test. > Let me (or someone else) >>>> dig into this more. > > Thanks for reporting this. > -will > > > -- > Will >>>> Fong, Senior Support Engineer > MariaDB Corporation > > >>>> _______________________________________________ > Mailing list: >>>> https://launchpad.net/~maria-discuss > Post to : >>>> maria-discuss@lists.launchpad.net > Unsubscribe : >>>> https://launchpad.net/~maria-discuss > More help : >>>> https://help.launchpad.net/ListHelp >>>> >>> >>> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~maria-discuss >> Post to : maria-discuss@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~maria-discuss >> More help : https://help.launchpad.net/ListHelp >> >> >
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp