Sorry, reposting topic with appropriate version number in subject line.

On 12/14/2010 09:21 AM, Stephen Thompson wrote:
> On 07/28/2010 11:56 AM, Kern Sibbald wrote:
>> On Wednesday 28 July 2010 19:44:49 Stephen Thompson wrote:
>>> After running for 3 months without this problem, it happened again last
>>> night.  We are running 5.0.2 at this point.
>>
>> I believe that the email problem is fixed in 5.0.3 which we will release
>> sometime this month, and which is in the git Source Forge repo, under
>> Branch-5.0.
>>
>> Kern
>
>
> We have been running 5.0.3 successfully since August (again about 3
> months) and just had this problem occur again last night.  Same symptoms:
>
> 1) SD crashes
> 2) DIR sends out continuous stream of emails (apparent infinite loop)
>
>      emails say:
>      "client-fd JobId 100001: Fatal error: backup.c:1048 Network send
> error to SD. ERR=Broken pipe"
>
> So, I reckon the problem was not fixed in 5.0.3.
> I'll post traceback to bugs.bacula.org.
>
> thanks,
> Stephen
>
>
>
>>
>>>
>>> Stephen
>>>
>>> On 04/15/2010 10:25 AM, Stephen Thompson wrote:
>>>> Hello,
>>>>
>>>> I have just now experienced a possible new bug with bacula 5.0.1.
>>>>
>>>> The symptoms are this:
>>>>
>>>> bacula-sd crashes
>>>> bacula-dir continues to run
>>>> bacula-dir then spews out identical "Intervention needed" emails until
>>>> manually restarted
>>>>
>>>> The first time this happened over a weekend and upon returning I found
>>>> my inbox has about 120,000 bacula emails, all the SAME and of this type:
>>>>
>>>> "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network
>>>> send error to SD. ERR=Broken pipe"
>>>>
>>>> It happened again just now (second time since upgrading from 3.0.3 to
>>>> 5.0.1) and I managed to stop the director with only a few thousand
>>>> emails going out.
>>>>
>>>> So there are really 2 issues here:
>>>>
>>>> 1)
>>>> Why does the director apparently get stuck in an infinite loop of
>>>> sending the same email message?  Is this a known bug?
>>>>
>>>> 2)
>>>> Regarding the SD, I received one alert of this type, the rest like the
>>>> above:
>>>>
>>>>      "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
>>>> dev->blocked()"
>>>>
>>>> A traceback like:
>>>> --
>>>> ptrace: Operation not permitted.
>>>> /var/bacula/work/29091: No such file or directory.
>>>> $1 = 0
>>>> /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command
>>>> file: No symbol "exename" in current context.
>>>> --
>>>>
>>>> And a bactrace like:
>>>> --
>>>> Attempt to dump current JCRs
>>>> JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41
>>>> JobStatus=l use_count=1
>>>>             JobType=B JobLevel=F
>>>>             sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
>>>>             end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
>>>>             db=(nil) db_batch=(nil) batch_started=0
>>>> JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04
>>>> JobStatus=R
>>>>             use_count=1
>>>>             JobType=B JobLevel=I
>>>>             sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
>>>>             end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
>>>>             db=(nil) db_batch=(nil) batch_started=0
>>>> Attempt to dump plugins. Hook count=0
>>>> --
>>>>
>>>> Both clients and server seem healthy, except for the SD crash.
>>>> Any ideas?
>>>>
>>>>
>>>> thanks!
>>>> Stephen
>>>>
>>>>
>>>> -------------------------------------------------------------------------
>>>> ------------ Further info:
>>>>
>>>> My catalog...
>>>>
>>>>         mysql-5.0.77 (64bit) MyISAM
>>>>         210Gb in size
>>>>         1,412,297,215 records in File table
>>>>         note: database built with bacula 2x scripts,
>>>>         upgraded with 3x scripts, then again with 5x scripts
>>>>         (i.e. nothing customized along the way)
>>>>
>>>> My OS&    hardware for bacula DIR+SD server...
>>>>
>>>>         Centos 5.4 (fully patched)
>>>>         8Gb RAM
>>>>         2Gb Swap
>>>>         1Tb EXT3 filesystem on external fiber RAID5 array
>>>>         (dedicated to database, incl. temp files)
>>>>         2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
>>>>         StorageTek SL500 Library with 2 LTO3 Drives
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------------
>>>> ----- Download Intel® Parallel Studio Eval
>>>> Try the new software tools for yourself. Speed compiling, find bugs
>>>> proactively, and fine-tune applications for parallel performance.
>>>> See why Intel Parallel Studio got high marks during beta.
>>>> http://p.sf.net/sfu/intel-sw-dev
>>>> _______________________________________________
>>>> Bacula-devel mailing list
>>>> Bacula-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>>
>
>


-- 
Stephen Thompson               Berkeley Seismological Laboratory
step...@seismo.berkeley.edu    215 McCone Hall # 4760
404.538.7077 (phone)           University of California, Berkeley
510.643.5811 (fax)             Berkeley, CA 94720-4760

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to