On Friday 14 April 2006 11:24, Cedric Tefft wrote:
> Hi all -
>
> For the last year or two (through three or four upgrades), Bacula has
> been segfaulting on me at irregular intervals, averaging approximately
> once a month.  Recently, however, the problem has been occuring more
> frequently, so this week I buckled down and tried to ferret out the
> problem.  It has been maddeningly difficult to reproduce, but  I now
> have a config that will consistently cause a segfault on my system.  It
> appears the problem is somehow related to Python, but I'm at a loss as
> to exactly what's wrong.  The Python script may execute flawlessly a
> dozen times and then on the next job, Bacula segfaults (apparently in
> pythonlib).  The odd thing is that, as far as I can tell, the Python
> script is making exactly the same branching decisions and executing
> exactly the same pieces of code when it segfaults as it did when it ran
> through just fine.  The other puzzling thing is that I can prevent the
> segfault by changing virtually anything about the director's config --
> things that, as far as I know, should have no effect on the Python
> script one way or another. I suspect several factors are interacting in
> JUST the right way to cause the segfault, but I'll be darned if I can
> untangle the mess.  Maybe one of you fine folks will have some insight.
> Here's what I 've got:
>
> You can see from the director config that I've got the same three jobs
> scheduled to run three times in a row.  You can see from the console log
> that all three jobs run successfully twice in a row.  Then, on the third
> run, it appears the first job segfaults in the middle of the Python
> script.  However, you can see from the director's debugging output that
> the segfault occurs just after the job record for the third job (the
> catalog backup) is created even though it doesn't show up in the console
> log.  My suspicion is that the segfault has something to do with the
> timing of the initialization of the third job, but I could be way off
> the mark there.
>
> Anyway, just to confuse the issue, in the course of finding a config
> that would consistently segfault, I found that any one of these changes
> will prevent it:
>
> * Disable any one of the three jobs (by commenting out the Schedule line)
> * Change the Level defined in the schedule from Incremental to Full
> * In the CatalogTest job, change the RunBefore script from
> make_catalog_backup to make_catalog_backup_fake (attached)
> * Create a separate schedule for the catalog backup job and have it
> start one minute later than the other two
>
> Any ideas what's wrong?  What I could test?
>

I've taken a close look at the Python code now, and I am quite sure that the 
problem occurs when your Python method calls back into Bacula. This creates a 
recursive call, and the Bacula code explicitly releases the Python global 
lock to prevent a deadlock (the Python global lock is not recursive).  This 
allows a second Bacula thread to execute, which most likely then changes the 
state of the Python interpreter and when the first thread returns, Python 
blows up.

I've worked up a patch to correct this problem, and I will send it to you in a 
separate email, because I am not sure that the whole list would like to 
receive it.  This patch is for version 1.38.8.  I don't think that it will 
apply to the version 1.38.7 that you are running.  You can try it, but if I 
remember right, I added a few additional Bacula variables ...   If it does 
not apply, I recommend that you upgrade to 1.38.8, then apply it.  The 
upgrade from 1.38.7 to 1.38.8 is rather simple, and you will only need to 
upgrade the DIR and the SD.  If you have any clients that use Python, you 
will need to upgrade them as well to take advantage of the fix, but I suspect 
that the problem you are seeing occurs for you only in the DIR.

Please let me know if this patch corrects the problem, at which point, I will 
"officially" release it.

By the way, I would be interested in hearing what you are doing with 
Bacula+Python, and most likely some of the list members would too.

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to