Additionally, seems like the SD was possibly reading a new freshly-labeled tape when it crashed... Last items in bacula log besides alerts already mentioned: 15-Apr 09:31 server-sd JobId 100000: Writing spooled data to Volume. Despooling 35,000,185,219 bytes ... 15-Apr 09:51 server-sd JobId 100000: End of Volume "FB0568" at 888:1414 on device "SL500-Drive-1" (/dev/nst0). Write of 262144 bytes got -1. 15-Apr 09:51 server-sd JobId 100000: Re-read of last block succeeded. 15-Apr 09:51 server-sd JobId 100000: End of medium on Volume "FB0568" Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51. 15-Apr 09:51 server-sd JobId 100000: 3307 Issuing autochanger "unload slot 38, drive 1" command. 15-Apr 09:52 server-sd JobId 100000: 3301 Issuing autochanger "loaded? drive 1" command. 15-Apr 09:52 server-sd JobId 100000: 3302 Autochanger "loaded? drive 1", result: nothing loaded. 15-Apr 09:52 server-sd JobId 100000: 3304 Issuing autochanger "load slot 39, drive 1" command. 15-Apr 09:52 server-sd JobId 100000: 3305 Autochanger "load slot 39, drive 1", status is OK. 15-Apr 09:52 server-sd JobId 100000: Volume "FB0569" previously written, moving to end of data. Nothing but thousands of 'repetitive' alerts after that... thanks again, Stephen On 04/15/2010 10:25 AM, Stephen Thompson wrote: > > Hello, > > I have just now experienced a possible new bug with bacula 5.0.1. > > The symptoms are this: > > bacula-sd crashes > bacula-dir continues to run > bacula-dir then spews out identical "Intervention needed" emails until > manually restarted > > The first time this happened over a weekend and upon returning I found > my inbox has about 120,000 bacula emails, all the SAME and of this type: > > "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network > send error to SD. ERR=Broken pipe" > > It happened again just now (second time since upgrading from 3.0.3 to > 5.0.1) and I managed to stop the director with only a few thousand > emails going out. > > So there are really 2 issues here: > > 1) > Why does the director apparently get stuck in an infinite loop of > sending the same email message? Is this a known bug? > > 2) > Regarding the SD, I received one alert of this type, the rest like the > above: > > "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: > dev->blocked()" > > A traceback like: > -- > ptrace: Operation not permitted. > /var/bacula/work/29091: No such file or directory. > $1 = 0 > /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file: > No symbol "exename" in current context. > -- > > And a bactrace like: > -- > Attempt to dump current JCRs > JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41 JobStatus=l > use_count=1 > JobType=B JobLevel=F > sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 > end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 > db=(nil) db_batch=(nil) batch_started=0 > JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04 > JobStatus=R > use_count=1 > JobType=B JobLevel=I > sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 > end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 > db=(nil) db_batch=(nil) batch_started=0 > Attempt to dump plugins. Hook count=0 > -- > > Both clients and server seem healthy, except for the SD crash. > Any ideas? > > > thanks! > Stephen > > > ------------------------------------------------------------------------------------- > Further info: > > My catalog... > > mysql-5.0.77 (64bit) MyISAM > 210Gb in size > 1,412,297,215 records in File table > note: database built with bacula 2x scripts, > upgraded with 3x scripts, then again with 5x scripts > (i.e. nothing customized along the way) > > My OS& hardware for bacula DIR+SD server... > > Centos 5.4 (fully patched) > 8Gb RAM > 2Gb Swap > 1Tb EXT3 filesystem on external fiber RAID5 array > (dedicated to database, incl. temp files) > 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs > StorageTek SL500 Library with 2 LTO3 Drives > > > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Bacula-devel mailing list > bacula-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-devel -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu 215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users