Hello, this is just forwarding your mail to bacula-devel, where it's more likely to be picked up, looked at, and perhaps integrated into the code base :-)
Cheers, and thanks for not only analyzing the problem, but also providing a possible fix! Arno 07.01.2010 16:34, Renaud Marquet wrote: > Hi, > > I'm using bacula 3.0.3 and the director's job queue was stuck after > running the first job. The others were waiting indefinitely for > execution. If the director was restarted, I could run only one job, and > so on. > > Googling around I found these 2 posts without satisfying anwsers : > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/upgrade-to-3-0-3-job-is-waiting-for-execution-102156/ > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/job-is-waiting-for-execuition-101508/ > > I then looked at the code and found there is a deadlock happening in > message handling. > > The problem is located in close_msg(JCR *) function in message.c. When > it encounters an error while sending an e-mail, it calls the macro Jmsg1 > (line 485) to report it. This macro calls dispatch_message, which tries > to acquire fides_mutex (line 738). Unfortunatly, this mutex was already > acquired in close_msg (line 431), thus resulting in a deadlock (as > stated in mutex documentation for PTHREAD_MUTEX_INITIALIZER kind). > > This problem was affecting me because mail daemon was not properly > configured on my server. > > It could be interesting to review these parts of the code to avoid such > situation. > > However I wrote a quick patch for lockmgr.c which simply upgrades > mutexes to PTHREAD_MUTEX_ERRORCHECK_NP kind and resolves this error. > > Hope this would help someone, > Renaud > > patch : > > diff -rupN bacula-3.0.3.vanilla/src/lib/lockmgr.c > bacula-3.0.3.patched/src/lib/lockmgr.c > --- bacula-3.0.3.vanilla/src/lib/lockmgr.c 2009-10-18 11:10:16.000000000 > +0200 > +++ bacula-3.0.3.patched/src/lib/lockmgr.c 2009-12-31 18:05:59.000000000 > +0100 > @@ -616,6 +616,15 @@ void lmgr_cleanup_main() > */ > int lmgr_mutex_lock(pthread_mutex_t *m, const char *file, int line) > { > + /* Patch to avoid deadlock if mutex is locked more than once */ > + /* There's some performance hit which makes it probably not > acceptable */ > + /* for large system usage. */ > + if(*m == PTHREAD_MUTEX_INITIALIZER) { > + pthread_mutexattr_t attr; > + pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_ERRORCHECK_NP ); > + pthread_mutex_init( m, &attr ); > + } > + > int ret; > lmgr_thread_t *self = lmgr_get_thread_info(); > self->pre_P(m, file, line); > > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > -- Arno Lehmann IT-Service Lehmann Sandstr. 6, 49080 Osnabrück www.its-lehmann.de ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users