On 07/28/2010 11:56 AM, Kern Sibbald wrote:
> On Wednesday 28 July 2010 19:44:49 Stephen Thompson wrote:
>> After running for 3 months without this problem, it happened again last
>> night.  We are running 5.0.2 at this point.
>
> I believe that the email problem is fixed in 5.0.3 which we will release
> sometime this month, and which is in the git Source Forge repo, under
> Branch-5.0.
>
> Kern


We have been running 5.0.3 successfully since August (again about 3 
months) and just had this problem occur again last night.  Same symptoms:

1) SD crashes
2) DIR sends out continuous stream of emails (apparent infinite loop)

    emails say:
    "client-fd JobId 100001: Fatal error: backup.c:1048 Network send 
error to SD. ERR=Broken pipe"

So, I reckon the problem was not fixed in 5.0.3.
I'll post traceback to bugs.bacula.org.

thanks,
Stephen



>
>>
>> Stephen
>>
>> On 04/15/2010 10:25 AM, Stephen Thompson wrote:
>>> Hello,
>>>
>>> I have just now experienced a possible new bug with bacula 5.0.1.
>>>
>>> The symptoms are this:
>>>
>>> bacula-sd crashes
>>> bacula-dir continues to run
>>> bacula-dir then spews out identical "Intervention needed" emails until
>>> manually restarted
>>>
>>> The first time this happened over a weekend and upon returning I found
>>> my inbox has about 120,000 bacula emails, all the SAME and of this type:
>>>
>>> "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network
>>> send error to SD. ERR=Broken pipe"
>>>
>>> It happened again just now (second time since upgrading from 3.0.3 to
>>> 5.0.1) and I managed to stop the director with only a few thousand
>>> emails going out.
>>>
>>> So there are really 2 issues here:
>>>
>>> 1)
>>> Why does the director apparently get stuck in an infinite loop of
>>> sending the same email message?  Is this a known bug?
>>>
>>> 2)
>>> Regarding the SD, I received one alert of this type, the rest like the
>>> above:
>>>
>>>     "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
>>> dev->blocked()"
>>>
>>> A traceback like:
>>> --
>>> ptrace: Operation not permitted.
>>> /var/bacula/work/29091: No such file or directory.
>>> $1 = 0
>>> /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command
>>> file: No symbol "exename" in current context.
>>> --
>>>
>>> And a bactrace like:
>>> --
>>> Attempt to dump current JCRs
>>> JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41
>>> JobStatus=l use_count=1
>>>            JobType=B JobLevel=F
>>>            sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
>>>            end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
>>>            db=(nil) db_batch=(nil) batch_started=0
>>> JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04
>>> JobStatus=R
>>>            use_count=1
>>>            JobType=B JobLevel=I
>>>            sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
>>>            end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
>>>            db=(nil) db_batch=(nil) batch_started=0
>>> Attempt to dump plugins. Hook count=0
>>> --
>>>
>>> Both clients and server seem healthy, except for the SD crash.
>>> Any ideas?
>>>
>>>
>>> thanks!
>>> Stephen
>>>
>>>
>>> -------------------------------------------------------------------------
>>> ------------ Further info:
>>>
>>> My catalog...
>>>
>>>        mysql-5.0.77 (64bit) MyISAM
>>>        210Gb in size
>>>        1,412,297,215 records in File table
>>>        note: database built with bacula 2x scripts,
>>>        upgraded with 3x scripts, then again with 5x scripts
>>>        (i.e. nothing customized along the way)
>>>
>>> My OS&   hardware for bacula DIR+SD server...
>>>
>>>        Centos 5.4 (fully patched)
>>>        8Gb RAM
>>>        2Gb Swap
>>>        1Tb EXT3 filesystem on external fiber RAID5 array
>>>        (dedicated to database, incl. temp files)
>>>        2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
>>>        StorageTek SL500 Library with 2 LTO3 Drives
>>>
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------------
>>> ----- Download Intel® Parallel Studio Eval
>>> Try the new software tools for yourself. Speed compiling, find bugs
>>> proactively, and fine-tune applications for parallel performance.
>>> See why Intel Parallel Studio got high marks during beta.
>>> http://p.sf.net/sfu/intel-sw-dev
>>> _______________________________________________
>>> Bacula-devel mailing list
>>> Bacula-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>


-- 
Stephen Thompson               Berkeley Seismological Laboratory
step...@seismo.berkeley.edu    215 McCone Hall # 4760
404.538.7077 (phone)           University of California, Berkeley
510.643.5811 (fax)             Berkeley, CA 94720-4760

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to