On 07/28/2010 11:56 AM, Kern Sibbald wrote: > On Wednesday 28 July 2010 19:44:49 Stephen Thompson wrote: >> After running for 3 months without this problem, it happened again last >> night. We are running 5.0.2 at this point. > > I believe that the email problem is fixed in 5.0.3 which we will release > sometime this month, and which is in the git Source Forge repo, under > Branch-5.0. > > Kern
We have been running 5.0.3 successfully since August (again about 3 months) and just had this problem occur again last night. Same symptoms: 1) SD crashes 2) DIR sends out continuous stream of emails (apparent infinite loop) emails say: "client-fd JobId 100001: Fatal error: backup.c:1048 Network send error to SD. ERR=Broken pipe" So, I reckon the problem was not fixed in 5.0.3. I'll post traceback to bugs.bacula.org. thanks, Stephen > >> >> Stephen >> >> On 04/15/2010 10:25 AM, Stephen Thompson wrote: >>> Hello, >>> >>> I have just now experienced a possible new bug with bacula 5.0.1. >>> >>> The symptoms are this: >>> >>> bacula-sd crashes >>> bacula-dir continues to run >>> bacula-dir then spews out identical "Intervention needed" emails until >>> manually restarted >>> >>> The first time this happened over a weekend and upon returning I found >>> my inbox has about 120,000 bacula emails, all the SAME and of this type: >>> >>> "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network >>> send error to SD. ERR=Broken pipe" >>> >>> It happened again just now (second time since upgrading from 3.0.3 to >>> 5.0.1) and I managed to stop the director with only a few thousand >>> emails going out. >>> >>> So there are really 2 issues here: >>> >>> 1) >>> Why does the director apparently get stuck in an infinite loop of >>> sending the same email message? Is this a known bug? >>> >>> 2) >>> Regarding the SD, I received one alert of this type, the rest like the >>> above: >>> >>> "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: >>> dev->blocked()" >>> >>> A traceback like: >>> -- >>> ptrace: Operation not permitted. >>> /var/bacula/work/29091: No such file or directory. >>> $1 = 0 >>> /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command >>> file: No symbol "exename" in current context. >>> -- >>> >>> And a bactrace like: >>> -- >>> Attempt to dump current JCRs >>> JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41 >>> JobStatus=l use_count=1 >>> JobType=B JobLevel=F >>> sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 >>> end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 >>> db=(nil) db_batch=(nil) batch_started=0 >>> JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04 >>> JobStatus=R >>> use_count=1 >>> JobType=B JobLevel=I >>> sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 >>> end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 >>> db=(nil) db_batch=(nil) batch_started=0 >>> Attempt to dump plugins. Hook count=0 >>> -- >>> >>> Both clients and server seem healthy, except for the SD crash. >>> Any ideas? >>> >>> >>> thanks! >>> Stephen >>> >>> >>> ------------------------------------------------------------------------- >>> ------------ Further info: >>> >>> My catalog... >>> >>> mysql-5.0.77 (64bit) MyISAM >>> 210Gb in size >>> 1,412,297,215 records in File table >>> note: database built with bacula 2x scripts, >>> upgraded with 3x scripts, then again with 5x scripts >>> (i.e. nothing customized along the way) >>> >>> My OS& hardware for bacula DIR+SD server... >>> >>> Centos 5.4 (fully patched) >>> 8Gb RAM >>> 2Gb Swap >>> 1Tb EXT3 filesystem on external fiber RAID5 array >>> (dedicated to database, incl. temp files) >>> 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs >>> StorageTek SL500 Library with 2 LTO3 Drives >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> ----- Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> Bacula-devel mailing list >>> Bacula-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/bacula-devel > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu 215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel