May have spoke too soon.. I forgot to set the settings back to AFTER_SYNC.
Will have to test with correct settings again, sorry for the false positive. ________________________________ From: Maria-discuss <maria-discuss-bounces+jpg=jpg.id...@lists.launchpad.net> on behalf of Joseph Glanville <j...@jpg.id.au> Sent: Tuesday, 16 August 2016 11:38:55 AM To: Pavel Ivanov; Kristian Nielsen; Jonas Oreland Cc: maria-discuss@lists.launchpad.net Subject: Re: [Maria-discuss] Semi-sync replication hangs when changing binlog filename. Hi Pavel, Kristian, This fixes the problem for me. How do we go about getting this into a release? Joseph. ________________________________ From: Pavel Ivanov <piva...@google.com> Sent: Tuesday, 16 August 2016 2:28 AM To: Kristian Nielsen; Jonas Oreland Cc: Joseph Glanville; maria-discuss@lists.launchpad.net Subject: Re: [Maria-discuss] Semi-sync replication hangs when changing binlog filename. Yes, Kristian. Thanks! This indeed looks like the correct patch. In fact I see that internally we have this changed as part of the same patch where after_flush was changed (i.e. https://github.com/MariaDB/server/commit/4d8b346e079a27960dbe49e4d0ec4364bed8d30e). [https://avatars1.githubusercontent.com/u/12231504?v=3&s=200]<https://github.com/MariaDB/server/commit/4d8b346e079a27960dbe49e4d0ec4364bed8d30e> MDEV-7257: Dump Thread Enhancements ยท MariaDB/server@4d8b346<https://github.com/MariaDB/server/commit/4d8b346e079a27960dbe49e4d0ec4364bed8d30e> github.com Make the binlog dump threads not need to take LOCK_log while sending binlog events to slave. Instead, a new LOCK_binlog_end_pos is used just to coordinate tracking the current end-of-log. This is ... I'm not sure why Jonas didn't include it in https://jira.mariadb.org/browse/MDEV-7257. On Mon, Aug 15, 2016 at 5:05 AM, Kristian Nielsen <kniel...@knielsen-hq.org> wrote: > Pavel Ivanov <piva...@google.com> writes: > >> binlog ending at position mariadb-bin.000004:2039896, somehow the >> function ReplSemiSyncMaster::commitTrx() gets trx_wait_binlog_name = >> 'mariadb-bin.000005' and trx_wait_binlog_pos = 2039896. I.e. the >> function gets the position of the transaction to wait semi-sync ack >> for correctly, but the file name is already the one that is current >> after rotation. Master starts waiting for that position, but the slave > >> Kristian, do you have any idea what's going on? Is there an >> inappropriate lock release/re-acquire somewhere? > > Hm. Actually, looking into MYSQL_BIN_LOG::trx_group_commit_leader, this > looks suspicious: > > RUN_HOOK(binlog_storage, after_flush, > (current->thd, > current->cache_mngr->last_commit_pos_file, > current->cache_mngr->last_commit_pos_offset, synced, > first, last)) > > But > > RUN_HOOK(binlog_storage, after_sync, > (current->thd, log_file_name, > current->cache_mngr->last_commit_pos_offset, > first, last)) > > I would have expected that `log_file_name' to be also > current->cache_mngr->last_commit_pos_file, like in the first instance. And > in fact, it looks like (with my limited knowledge of semi-sync) that this > suspicious case is exactly the AFTER_SYNC which fails, while AFTER_COMMIT > works... > > So maybe try the below patch? > > Pavel, what do you think, do you agree that this patch should be better? > > - Kristian. > > > diff --git a/sql/log.cc b/sql/log.cc > index 7efec98..b77a6b3 100644 > --- a/sql/log.cc > +++ b/sql/log.cc > @@ -7712,7 +7712,7 @@ > MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader) > last= current->next == NULL; > if (!current->error && > RUN_HOOK(binlog_storage, after_sync, > - (current->thd, log_file_name, > + (current->thd, current->cache_mngr->last_commit_pos_file, > current->cache_mngr->last_commit_pos_offset, > first, last))) > { >
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp