Kristian, hello. Perhaps you could review this worker pool patch though it actually follows up changes made by MDEV-9573, by Monty.
Cheers, Andrei > revision-id: 0f97f6b8398054ccb0507fbacc76c9deeddd47a4 > (mariadb-10.1.35-71-g0f97f6b8398) > parent(s): 1fc5a6f30c3a9c047dcf9a36b00026d98f286f6b > author: Andrei Elkin > committer: Andrei Elkin > timestamp: 2018-10-03 15:42:12 +0300 > message: > > MDEV-17346 parallel slave start and stop races to workers disappeared > > The bug appears as a slave SQL thread hanging in > rpl_parallel_thread_pool::get_thread() while there are no slave worker > threads to awake it. > > The hang could occur at parallel slave worker pool activation by a > "new" started SQL thread when the pool was concurrently deactivated by > being terminated "old" SQL thread. At reading the current pool size > the SQL thread did not employ necessary protection designed by > MDEV-9573. The pool can't be deactivated when there is an active slave > but the "new" slave might set its active status too late while seeing > the pool still non-empty. The end product of four computaional events > > Old_slave:any_slave_sql_running() => 0 > New_slave:slave_running= "active" > New_slave:global_rpl_thread_pool.size > 0 => true > Old_slave:global_rpl_thread_pool.size := 0 > > could led to the observed hang. The new SQL thread proceeds to scheduling > having all workers gone. > > Fixed with making the SQL thread at the pool activation first > to grab the same lock as potential deactivator also does prior > to destroy the pool. > > --- > sql/rpl_parallel.cc | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc > index 35cddee6d4d..8fef2d66635 100644 > --- a/sql/rpl_parallel.cc > +++ b/sql/rpl_parallel.cc > @@ -1617,13 +1617,32 @@ int rpl_parallel_resize_pool_if_no_slaves(void) > } > > > +/** > + Pool activation is preceeded by taking a "lock" of pool_mark_busy > + which guarantees the number of running slaves drops to zero atomicly > + with the number of pool workers. > + This resolves race between the function caller thread and one > + that may be attempting to deactivate the pool. > +*/ > int > rpl_parallel_activate_pool(rpl_parallel_thread_pool *pool) > { > + int rc= 0; > + > + if ((rc= pool_mark_busy(pool, current_thd))) > + return rc; // killed > + > if (!pool->count) > - return rpl_parallel_change_thread_count(pool, opt_slave_parallel_threads, > - 0); > - return 0; > + { > + pool_mark_not_busy(pool); > + rc= rpl_parallel_change_thread_count(pool, opt_slave_parallel_threads, > + 0); > + } > + else > + { > + pool_mark_not_busy(pool); > + } > + return rc; > } > > > _______________________________________________ > commits mailing list > comm...@mariadb.org > https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp