Looking at the code it looks like most places (including THD::awake) assume that thd->mysys_var->current_mutex and thd->mysys_var->current_cond can be changed only when thd->mysys_var->mutex is locked. debug_sync.cc is breaking this assumption. I've found that Item_func_sleep::val_int(), THD::enter_cond() and THD::exit_cond() break this assumption too. I believe all of those places should be fixed to lock thd->mysys_var->mutex before any attempts to read or write to thd->mysys_var->current_mutex and thd->mysys_var->current_cond.
Pavel On Wed, Feb 26, 2014 at 8:29 AM, Kristian Nielsen <kniel...@knielsen-hq.org> wrote: > I got a crash in my parallel replication test case, and this one turned out to > be caused by a fundamental problem in debug_sync. I am not sure how to solve > it, so I wanted to explain the issue in case others have some ideas. > > The crash happens in THD::awake() when a thread is killed: > > mysql_mutex_lock(&mysys_var->mutex); > ... > int ret= mysql_mutex_trylock(mysys_var->current_mutex); > mysql_cond_broadcast(mysys_var->current_cond); > if (!ret) > mysql_mutex_unlock(mysys_var->current_mutex); > > The problem is that mysys_var->current_mutex changed between the trylock() and > the unlock(), so we unlock a different mutex than the one we locked - ouch! > > The mutex changed because of this code in debug_sync.cc: > > if (thd->mysys_var) > { > old_mutex= thd->mysys_var->current_mutex; > old_cond= thd->mysys_var->current_cond; > thd->mysys_var->current_mutex= &debug_sync_global.ds_mutex; > thd->mysys_var->current_cond= &debug_sync_global.ds_cond; > } > > There is no mutex protection here. > > So this means that it is not safe to use debug_sync inside > enter_cond()/exit_cond() if kill will be used on the thread at that point. > Since I have a lot of tests for parallel replication where I test exactly for > correct error handling when killing threads at various points in the parallel > processing, it is not too surprising that I would eventually be hit by this :) > > I am going to avoid using debug_sync inside enter_cond()/exit_cond() for now. > But any ideas for how to solve this properly? This is particularly nasty to be > hit by, as it is not at all obvious that this use of debug_sync should be a > problem. And it is not exactly easy to guess from the failure what the real > problem is - assuming one can even reproduce the failure, the window of > opportunity in the race is after all rather quite small. > > - Kristian. > > _______________________________________________ > Mailing list: https://launchpad.net/~maria-developers > Post to : maria-developers@lists.launchpad.net > Unsubscribe : https://launchpad.net/~maria-developers > More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp