All the status variables look sane. But you've uploaded empty binlogs,
there's no transaction that is hung in them. So I'd guess it should be in
mariadb-bin.000004 (or an earlier file) on master and either it doesn't
exist on slave or it's in mariadb-bin.000001 and relay-bin.000004 (or in
earlier relay-bin.* file).

On Fri, Jul 29, 2016 at 1:29 AM, Joseph Glanville <j...@jpg.id.au> wrote:

> Hi Justin,
>
>
> Adjusting the timeout doesn't seem to have any effect. Though setting it
> low enough does cause the master to time out waiting for the slave to
> acknowledge the write and falls back to async only to immediately
> re-establish semi-sync replication. It does this every time the master
> begins writing to a new binlog.
>
>
> Joseph.
> ------------------------------
> *From:* Justin Swanhart <greenl...@gmail.com>
> *Sent:* Friday, 29 July 2016 6:17:28 PM
> *To:* Joseph Glanville
> *Cc:* Pavel Ivanov; maria-discuss@lists.launchpad.net
>
> *Subject:* Re: [Maria-discuss] Semi-sync replication hangs when changing
> binlog filename.
>
> Hi,
>
> Does the problem appear if you set the timeout value to
> 9223372036854775807?
>
>
> On Fri, Jul 29, 2016 at 3:24 AM, Joseph Glanville <j...@jpg.id.au> wrote:
>
>> Hi Pavel.
>>
>>
>> To describe the setup a little better the master replicates to a
>> semi-sync slave, which then replicates to an async slave. This is to ensure
>> at any point in time both the master and the semi-sync slave have a
>> complete copy of the data. If the master fails the semi-sync is
>> automatically promoted to master and the async switches to replicating with
>> semi-sync replication. If the semi-sync fails then the async remasters
>> itself to the master and switches to semi-sync.
>>
>>
>> However I don't think the 3rd node has any bearing on the hang, I built a
>> test cluster without it and the hang is still easy to reproduce. I just
>> restore a decent sized dump, in this case a portion of the Wikipedia
>> database and the cluster reliably hangs when the master begins writing to
>> the new binlog.
>>
>> The dump is here if someone wants to use it to reproduce:
>> https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz
>>
>>
>> I have created a gist with the output of `SHOW STATUS LIKE
>> 'Rpl_semi_sync%s'` on both master and slave of the simplified 2 node setup.
>> I have also included the binlogs of both the master and the slave and the
>> relay log on the slave.
>>
>> https://gist.github.com/josephglanville/70789bc9c3744090a17070652cded68b
>>
>>
>> <https://gist.github.com/josephglanville/70789bc9c3744090a17070652cded68b>Let
>> me know if there is any other useful information I can provide.
>>
>>
>> Joseph.
>> ------------------------------
>> *From:* Pavel Ivanov <piva...@google.com>
>> *Sent:* Friday, 29 July 2016 4:31:26 PM
>> *To:* Joseph Glanville
>> *Cc:* Will Fong; maria-discuss@lists.launchpad.net
>> *Subject:* Re: [Maria-discuss] Semi-sync replication hangs when changing
>> binlog filename.
>>
>> This looks pretty weird. If you don't mind more information would be
>> useful to look at: contents of mariadb-bin.000005 on the master, in
>> particular what GTID and binlog position the transaction waiting for
>> semi-sync ack has (confirm that it's 0-1684280839-156 and ends at offset
>> 329); result of "show status like 'rpl_semi_sync_%'" on both master and
>> slave; contents of relay-bin.000005 and binlog on the slave, in particular
>> did it really execute the transaction that is currently hanging on the
>> master? Out of curiosity: it looks like the slave also acts as a master to
>> someone else. Can you also verify that the transaction hanging now on the
>> master made it to that second-level slave?
>>
>> But to be honest, I don't quite understand how what you show us could
>> happen, so I'm just asking to look at the info that I would look at if I
>> were investigating such problem.
>>
>> On Thu, Jul 28, 2016 at 10:52 PM, Joseph Glanville <j...@jpg.id.au> wrote:
>>
>>> Hi Pavel.
>>>
>>> Yes, by “binlog filename changes” I mean the master begins writing to a
>>> new binlog file.
>>>
>>> Output of all the requested commands are in this gist:
>>> https://gist.github.com/josephglanville/7b96c34bb6e79ace33e56627672b98a5
>>>
>>> Joseph Glanville
>>> Sent from Polymail
>>> <https://polymail.io/?utm_source=polymail&utm_medium=referral&utm_campaign=signature>
>>>
>>>
>>> On Fri, 29 Jul 2016 at 3:08 PM Pavel Ivanov <Pavel Ivanov
>>> <pavel+ivanov+%3cpiva...@google.com%3E>> wrote:
>>>
>>>> By "binlog filename changes" you mean when master starts writing
>>>> binlogs into a new file? Can you clarify how the replication stalls? What
>>>> "show processlist" shows at that time on master and on slave? What does
>>>> "show slave status" show on the slave? On Thu, Jul 28, 2016 at 10:03 PM,
>>>> Will Fong wrote: > Hi Joseph, > > On Fri, Jul 29, 2016 at 10:11 AM,
>>>> Joseph Glanville wrote: >> However whenever the binlog filename
>>>> changes the replication stalls >> indefinitely. > > Interesting! I may have
>>>> reproduced this, but it was only a quick test. > Let me (or someone else)
>>>> dig into this more. > > Thanks for reporting this. > -will > > > -- > Will
>>>> Fong, Senior Support Engineer > MariaDB Corporation > >
>>>> _______________________________________________ > Mailing list:
>>>> https://launchpad.net/~maria-discuss > Post to :
>>>> maria-discuss@lists.launchpad.net > Unsubscribe :
>>>> https://launchpad.net/~maria-discuss > More help :
>>>> https://help.launchpad.net/ListHelp
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~maria-discuss
>> Post to     : maria-discuss@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~maria-discuss
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to