Thank's problem solved
2010/1/12 Kern Sibbald <k...@sibbald.com> > On Tuesday 12 January 2010 12:26:37 Carlo Filippetto wrote: > > Have someone found any other definitive solution? > > In this moment I had to disable the mail > > The failure does not occur if you have a valid SMTP server defined. > > > > > Thank's > > > > > > > > 2010/1/9 Renaud Marquet <rmarq...@gmail.com> > > > > > Le samedi 09 janvier 2010 à 21:25 +0100, Kern Sibbald a écrit : > > > > Hello, > > > > > > > > On Saturday 09 January 2010 20:20:01 Renaud Marquet wrote: > > > > > Kern, > > > > > > > > > > altough I searched for a possible workaround, I didn't found the > ones > > > > > you talk about. But your statement is not correct as pointing to a > > > > > > valid > > > > > > > > smtp server is not a proper workaround. Actually, if for some > reason, > > > > > the *valid* smtp server is down, the problem will occur and I bet > > > > > users will not figure out the reason. > > > > > > > > I never claimed that my suggestion was a "proper" workaround nor that > > > > it > > > > > > was a > > > > > > > fix. It is a workaround. > > > > > > Nevermind then ;) > > > > > > > If you want, you can backport the fixes (applied 23 October 2009), > but > > > > > > since > > > > > > > we are close to release, and we have a workaround, we are not > planning > > > > to backport them. > > > > > > No need to backport. This is not a 'blocker' problem, I just mailed > here > > > in case someone else run into the same problem because there wasn't any > > > answer when googling. Bacula now runs perfectly fine on my system, so I > > > can wait for the upcoming release without any trouble. > > > > > > > > That's why I came up with this patch. It correctly fixes the > problem > > > > > > but > > > > > > > > I recognize this could affect performances so it should certainly > not > > > > > > be > > > > > > > > put in the trunk. It will even probably be useless as you pointed > out > > > > > it's already fixed in developpement version. > > > > > > > > Unfortunately your patch does not fix the problem -- it masks the > > > > > > problem. I > > > > > > > didn't look at your patch in detail, but I believe that it will make > > > > all locks recursive, which is not really what we want and may lead to > > > > some surprises. > > > > > > > > Bacula does have recursive locks, but we use them only in situations > > > > > > where > > > > > > > they need to be used and they are portable. I am not so much worried > > > > > > about > > > > > > > the performance consequences of your patch, but your code is Linux > only > > > > > > if I > > > > > > > am not mistaken (i.e. not portable), and as I said, the lock manager > is > > > > > > not > > > > > > > production code. It is development should only be turned on for > > > > > > developer's > > > > > > > for debugging. > > > > > > As I said in another mail, I didn't do anything to activate this lock > > > manager, so I guess it's not. I think the confusion come from the fact > > > mutexes are handled through some functions in lockmgr.c (through a > > > macro), I think even with lock manager deactivated. > > > > > > > > That said, I didn't know lock manager should be turned off in > > > > > > production > > > > > > > > environment. Moreover, I'm not sure I understand your point > because, > > > > > although I didn't read all the code, it seems pretty strange to me > > > > > that a multithreaded application should not use any mutexes in a > > > > > production environment. > > > > > > > > We use mutexes in production as in development. The lock manager > > > > > > "watches" > > > > > > > our lock usage and blows up Bacula if it detects a problem (deadlock, > > > > out > > > > > > of > > > > > > > order locks, ...). It is a debug tool and not meant or sufficently > > > > > > tested > > > > > > > for production use. Use it at your own risk. > > > > > > > > That said, you were very clever to figure out the problem. Not many > > > > users could do so. > > > > > > Thank you, > > > Regards. > > > > > > > Regards, > > > > > > > > Kern > > > > > > > > > Regards, > > > > > Renaud > > > > > > > > > > Le samedi 09 janvier 2010 à 00:03 +0100, Kern Sibbald a écrit : > > > > > > Hello Arno and Renaud, > > > > > > > > > > > > I can believe that there might be a bug in the lock manager > > > > > > software, > > > > > > but > > > > > > > > > I am very surprised that it is turned on. It should only be > turned > > > > > > on > > > > > > for > > > > > > > > > developers, and thus though this patch may be correct (I don't > > > > > > think > > > > > > so, > > > > > > > > > but Eric can answer more definitively), it should never be needed > > > > > > in > > > > > > a > > > > > > > > > production system, and won't work in a production system because > of > > > > > > the > > > > > > > > > lock manager being turned off. > > > > > > > > > > > > Can you explain why the lock manager code is turned on? > > > > > > > > > > > > If this is a problem with a misconfigured mail daemon, then it is > > > > > > very > > > > > > > > > likely that this problem has already shown up and has a very > > > > > > different > > > > > > > > > solution. The problem I just mentioned is fixed in the current > > > > > > development version, and the workaround for version 3.0.x is to > > > > > > ensure > > > > > > > > > that either email is turned off or you point to a valid smtp > > > > > > server. > > > > > > > > > > > > Regards, > > > > > > > > > > > > Kern > > > > > > > > > > > > On Friday 08 January 2010 21:32:18 Arno Lehmann wrote: > > > > > > > Hello, > > > > > > > > > > > > > > this is just forwarding your mail to bacula-devel, where it's > > > > > > > more likely to be picked up, looked at, and perhaps integrated > > > > > > > into the code base :-) > > > > > > > > > > > > > > Cheers, and thanks for not only analyzing the problem, but also > > > > > > > providing a possible fix! > > > > > > > > > > > > > > Arno > > > > > > > > > > > > > > 07.01.2010 16:34, Renaud Marquet wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I'm using bacula 3.0.3 and the director's job queue was stuck > > > > > > after > > > > > > > > > > > running the first job. The others were waiting indefinitely > for > > > > > > > > execution. If the director was restarted, I could run only > one > > > > > > job, > > > > > > > > > > > and so on. > > > > > > > > > > > > > > > > Googling around I found these 2 posts without satisfying > > > > > > > > anwsers > > > > > > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-maili > > > > > > > > > > >ng-l > > > > > > ists-3/bacula-25/upgrade-to-3-0-3-job-is-waiting-for-execution-102156 > > > > > > > > > > >/ > > > > > > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-maili > > > > > > > > > > >ng-l ists-3/bacula-25/job-is-waiting-for-execuition-101508/ > > > > > > > > > > > > > > > > I then looked at the code and found there is a deadlock > > > > > > > > happening > > > > > > in > > > > > > > > > > > message handling. > > > > > > > > > > > > > > > > The problem is located in close_msg(JCR *) function in > > > > > > > > message.c. When it encounters an error while sending an > e-mail, > > > > > > > > it calls the macro Jmsg1 (line 485) to report it. This macro > > > > > > > > calls > > > > > > > > dispatch_message, which tries to acquire fides_mutex (line > > > > > > > > 738). Unfortunatly, this mutex was already acquired in > > > > > > > > close_msg (line 431), thus resulting in a deadlock (as stated > > > > > > > > in mutex > > > > > > documentation > > > > > > > > > > > for PTHREAD_MUTEX_INITIALIZER kind). > > > > > > > > > > > > > > > > This problem was affecting me because mail daemon was not > > > > > > properly > > > > > > > > > > > configured on my server. > > > > > > > > > > > > > > > > It could be interesting to review these parts of the code to > > > > > > avoid > > > > > > > > > > > such situation. > > > > > > > > > > > > > > > > However I wrote a quick patch for lockmgr.c which simply > > > > > > > > upgrades mutexes to PTHREAD_MUTEX_ERRORCHECK_NP kind and > > > > > > > > resolves this > > > > > > error. > > > > > > > > > > > Hope this would help someone, > > > > > > > > Renaud > > > > > > > > > > > > > > > > patch : > > > > > > > > > > > > > > > > diff -rupN bacula-3.0.3.vanilla/src/lib/lockmgr.c > > > > > > > > bacula-3.0.3.patched/src/lib/lockmgr.c > > > > > > > > --- bacula-3.0.3.vanilla/src/lib/lockmgr.c 2009-10-18 > > > > > > > > 11:10:16.000000000 +0200 > > > > > > > > +++ bacula-3.0.3.patched/src/lib/lockmgr.c 2009-12-31 > > > > > > > > 18:05:59.000000000 +0100 > > > > > > > > @@ -616,6 +616,15 @@ void lmgr_cleanup_main() > > > > > > > > */ > > > > > > > > int lmgr_mutex_lock(pthread_mutex_t *m, const char *file, > int > > > > > > line) > > > > > > > > > > > { > > > > > > > > + /* Patch to avoid deadlock if mutex is locked more than > > > > > > > > once > > > > > > */ > > > > > > > > > > > + /* There's some performance hit which makes it probably > not > > > > > > > > acceptable */ > > > > > > > > + /* for large system usage. */ > > > > > > > > + if(*m == PTHREAD_MUTEX_INITIALIZER) { > > > > > > > > + pthread_mutexattr_t attr; > > > > > > > > + pthread_mutexattr_settype( &attr, > > > > > > PTHREAD_MUTEX_ERRORCHECK_NP > > > > > > > > > > > ); + pthread_mutex_init( m, &attr ); > > > > > > > > + } > > > > > > > > + > > > > > > > > int ret; > > > > > > > > lmgr_thread_t *self = lmgr_get_thread_info(); > > > > > > > > self->pre_P(m, file, line); > > > > > > --------------------------------------------------------------------- > > > > > > > > > > >---- ----- This SF.Net email is sponsored by the Verizon > > > > > > > > Developer Community Take advantage of Verizon's best-in-class > > > > > > > > app > > > > > > development > > > > > > > > > > > support A streamlined, 14 day to market process makes app > > > > > > > > distribution fast and easy Join now and get one step closer > to > > > > > > > > millions of Verizon customers > > > > > > http://p.sf.net/sfu/verizon-dev2dev > > > > > > > > > > > _______________________________________________ > > > > > > > > Bacula-users mailing list > > > > > > > > Bacula-users@lists.sourceforge.net > > > > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > > > > > > ------------------------------------------------------------------------- > > > > > > > > >----- This SF.Net email is sponsored by the Verizon Developer > > > > > > Community > > > > > > > > > Take advantage of Verizon's best-in-class app development support > A > > > > > > streamlined, 14 day to market process makes app distribution fast > > > > > > and easy Join now and get one step closer to millions of Verizon > > > > > > customers > > > > > > > > > http://p.sf.net/sfu/verizon-dev2dev > > > > > > _______________________________________________ > > > > > > Bacula-users mailing list > > > > > > Bacula-users@lists.sourceforge.net > > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > > > > > > ------------------------------------------------------------------------- > > >----- This SF.Net email is sponsored by the Verizon Developer Community > > > Take advantage of Verizon's best-in-class app development support A > > > streamlined, 14 day to market process makes app distribution fast and > > > easy > > > Join now and get one step closer to millions of Verizon customers > > > http://p.sf.net/sfu/verizon-dev2dev > > > _______________________________________________ > > > Bacula-users mailing list > > > Bacula-users@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > >
------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users