Le samedi 09 janvier 2010 à 21:25 +0100, Kern Sibbald a écrit : > Hello, > > On Saturday 09 January 2010 20:20:01 Renaud Marquet wrote: > > Kern, > > > > altough I searched for a possible workaround, I didn't found the ones > > you talk about. But your statement is not correct as pointing to a valid > > smtp server is not a proper workaround. Actually, if for some reason, > > the *valid* smtp server is down, the problem will occur and I bet users > > will not figure out the reason. > > I never claimed that my suggestion was a "proper" workaround nor that it was > a > fix. It is a workaround.
Nevermind then ;) > > If you want, you can backport the fixes (applied 23 October 2009), but since > we are close to release, and we have a workaround, we are not planning to > backport them. No need to backport. This is not a 'blocker' problem, I just mailed here in case someone else run into the same problem because there wasn't any answer when googling. Bacula now runs perfectly fine on my system, so I can wait for the upcoming release without any trouble. > > > > > That's why I came up with this patch. It correctly fixes the problem but > > I recognize this could affect performances so it should certainly not be > > put in the trunk. It will even probably be useless as you pointed out > > it's already fixed in developpement version. > > Unfortunately your patch does not fix the problem -- it masks the problem. I > didn't look at your patch in detail, but I believe that it will make all > locks recursive, which is not really what we want and may lead to some > surprises. > > Bacula does have recursive locks, but we use them only in situations where > they need to be used and they are portable. I am not so much worried about > the performance consequences of your patch, but your code is Linux only if I > am not mistaken (i.e. not portable), and as I said, the lock manager is not > production code. It is development should only be turned on for developer's > for debugging. As I said in another mail, I didn't do anything to activate this lock manager, so I guess it's not. I think the confusion come from the fact mutexes are handled through some functions in lockmgr.c (through a macro), I think even with lock manager deactivated. > > > > > That said, I didn't know lock manager should be turned off in production > > environment. Moreover, I'm not sure I understand your point because, > > although I didn't read all the code, it seems pretty strange to me that > > a multithreaded application should not use any mutexes in a production > > environment. > > We use mutexes in production as in development. The lock manager "watches" > our lock usage and blows up Bacula if it detects a problem (deadlock, out of > order locks, ...). It is a debug tool and not meant or sufficently tested > for production use. Use it at your own risk. > > That said, you were very clever to figure out the problem. Not many users > could do so. Thank you, Regards. > > Regards, > > Kern > > > > > Regards, > > Renaud > > > > Le samedi 09 janvier 2010 à 00:03 +0100, Kern Sibbald a écrit : > > > Hello Arno and Renaud, > > > > > > I can believe that there might be a bug in the lock manager software, but > > > I am very surprised that it is turned on. It should only be turned on for > > > developers, and thus though this patch may be correct (I don't think so, > > > but Eric can answer more definitively), it should never be needed in a > > > production system, and won't work in a production system because of the > > > lock manager being turned off. > > > > > > Can you explain why the lock manager code is turned on? > > > > > > If this is a problem with a misconfigured mail daemon, then it is very > > > likely that this problem has already shown up and has a very different > > > solution. The problem I just mentioned is fixed in the current > > > development version, and the workaround for version 3.0.x is to ensure > > > that either email is turned off or you point to a valid smtp server. > > > > > > Regards, > > > > > > Kern > > > > > > On Friday 08 January 2010 21:32:18 Arno Lehmann wrote: > > > > Hello, > > > > > > > > this is just forwarding your mail to bacula-devel, where it's more > > > > likely to be picked up, looked at, and perhaps integrated into the > > > > code base :-) > > > > > > > > Cheers, and thanks for not only analyzing the problem, but also > > > > providing a possible fix! > > > > > > > > Arno > > > > > > > > 07.01.2010 16:34, Renaud Marquet wrote: > > > > > Hi, > > > > > > > > > > I'm using bacula 3.0.3 and the director's job queue was stuck after > > > > > running the first job. The others were waiting indefinitely for > > > > > execution. If the director was restarted, I could run only one job, > > > > > and so on. > > > > > > > > > > Googling around I found these 2 posts without satisfying anwsers : > > > > > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-maili > > > > >ng-l > > > > > ists-3/bacula-25/upgrade-to-3-0-3-job-is-waiting-for-execution-102156 > > > > >/ > > > > > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-maili > > > > >ng-l ists-3/bacula-25/job-is-waiting-for-execuition-101508/ > > > > > > > > > > I then looked at the code and found there is a deadlock happening in > > > > > message handling. > > > > > > > > > > The problem is located in close_msg(JCR *) function in message.c. > > > > > When it encounters an error while sending an e-mail, it calls the > > > > > macro Jmsg1 (line 485) to report it. This macro calls > > > > > dispatch_message, which tries to acquire fides_mutex (line 738). > > > > > Unfortunatly, this mutex was already acquired in close_msg (line > > > > > 431), thus resulting in a deadlock (as stated in mutex documentation > > > > > for PTHREAD_MUTEX_INITIALIZER kind). > > > > > > > > > > This problem was affecting me because mail daemon was not properly > > > > > configured on my server. > > > > > > > > > > It could be interesting to review these parts of the code to avoid > > > > > such situation. > > > > > > > > > > However I wrote a quick patch for lockmgr.c which simply upgrades > > > > > mutexes to PTHREAD_MUTEX_ERRORCHECK_NP kind and resolves this error. > > > > > > > > > > Hope this would help someone, > > > > > Renaud > > > > > > > > > > patch : > > > > > > > > > > diff -rupN bacula-3.0.3.vanilla/src/lib/lockmgr.c > > > > > bacula-3.0.3.patched/src/lib/lockmgr.c > > > > > --- bacula-3.0.3.vanilla/src/lib/lockmgr.c 2009-10-18 > > > > > 11:10:16.000000000 +0200 > > > > > +++ bacula-3.0.3.patched/src/lib/lockmgr.c 2009-12-31 > > > > > 18:05:59.000000000 +0100 > > > > > @@ -616,6 +616,15 @@ void lmgr_cleanup_main() > > > > > */ > > > > > int lmgr_mutex_lock(pthread_mutex_t *m, const char *file, int line) > > > > > { > > > > > + /* Patch to avoid deadlock if mutex is locked more than once */ > > > > > + /* There's some performance hit which makes it probably not > > > > > acceptable */ > > > > > + /* for large system usage. */ > > > > > + if(*m == PTHREAD_MUTEX_INITIALIZER) { > > > > > + pthread_mutexattr_t attr; > > > > > + pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_ERRORCHECK_NP > > > > > ); + pthread_mutex_init( m, &attr ); > > > > > + } > > > > > + > > > > > int ret; > > > > > lmgr_thread_t *self = lmgr_get_thread_info(); > > > > > self->pre_P(m, file, line); > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > >---- ----- This SF.Net email is sponsored by the Verizon Developer > > > > > Community Take advantage of Verizon's best-in-class app development > > > > > support A streamlined, 14 day to market process makes app > > > > > distribution fast and easy Join now and get one step closer to > > > > > millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev > > > > > _______________________________________________ > > > > > Bacula-users mailing list > > > > > Bacula-users@lists.sourceforge.net > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > > > > > ------------------------------------------------------------------------- > > >----- This SF.Net email is sponsored by the Verizon Developer Community > > > Take advantage of Verizon's best-in-class app development support A > > > streamlined, 14 day to market process makes app distribution fast and > > > easy Join now and get one step closer to millions of Verizon customers > > > http://p.sf.net/sfu/verizon-dev2dev > > > _______________________________________________ > > > Bacula-users mailing list > > > Bacula-users@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users