On Thursday 15 April 2010 19:36:51 Stephen Thompson wrote: > Additionally, seems like the SD was possibly reading a new > freshly-labeled tape when it crashed... Last items in bacula log > besides alerts already mentioned:
In Bacula "alerts" refer to tape drive information stored concerning tape problems, so I am assuming you mean messages. > > > 15-Apr 09:31 server-sd JobId 100000: Writing spooled data to Volume. > Despooling 35,000,185,219 bytes ... > 15-Apr 09:51 server-sd JobId 100000: End of Volume "FB0568" at 888:1414 > on device "SL500-Drive-1" (/dev/nst0). Write of 262144 bytes got -1. > 15-Apr 09:51 server-sd JobId 100000: Re-read of last block succeeded. > 15-Apr 09:51 server-sd JobId 100000: End of medium on Volume "FB0568" > Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51. > 15-Apr 09:51 server-sd JobId 100000: 3307 Issuing autochanger "unload > slot 38, drive 1" command. > 15-Apr 09:52 server-sd JobId 100000: 3301 Issuing autochanger "loaded? > drive 1" command. > 15-Apr 09:52 server-sd JobId 100000: 3302 Autochanger "loaded? drive 1", > result: nothing loaded. > 15-Apr 09:52 server-sd JobId 100000: 3304 Issuing autochanger "load slot > 39, drive 1" command. > 15-Apr 09:52 server-sd JobId 100000: 3305 Autochanger "load slot 39, > drive 1", status is OK. > 15-Apr 09:52 server-sd JobId 100000: Volume "FB0569" previously written, > moving to end of data. > > Nothing but thousands of 'repetitive' alerts after that... What exactly is repeated? There was a Bacula bug #1480 in message delivery that may be the same that you are experiencing, it was triggered by a misconfigured SMTP server or by a reference in Bacula to a non-existent SMTP server - and the simple solution is to make sure Bacula points to a valid functional SMTP server. This problem was not particular to version 5.0.1, but I think it was fixed after the release of 5.0.1. Please see the bugs database for more details. Kern > > thanks again, > Stephen > > On 04/15/2010 10:25 AM, Stephen Thompson wrote: > > Hello, > > > > I have just now experienced a possible new bug with bacula 5.0.1. > > > > The symptoms are this: > > > > bacula-sd crashes > > bacula-dir continues to run > > bacula-dir then spews out identical "Intervention needed" emails until > > manually restarted > > > > The first time this happened over a weekend and upon returning I found > > my inbox has about 120,000 bacula emails, all the SAME and of this type: > > > > "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network > > send error to SD. ERR=Broken pipe" > > > > It happened again just now (second time since upgrading from 3.0.3 to > > 5.0.1) and I managed to stop the director with only a few thousand > > emails going out. > > > > So there are really 2 issues here: > > > > 1) > > Why does the director apparently get stuck in an infinite loop of > > sending the same email message? Is this a known bug? > > > > 2) > > Regarding the SD, I received one alert of this type, the rest like the > > above: > > > > "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: > > dev->blocked()" > > > > A traceback like: > > -- > > ptrace: Operation not permitted. > > /var/bacula/work/29091: No such file or directory. > > $1 = 0 > > /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command > > file: No symbol "exename" in current context. > > -- > > > > And a bactrace like: > > -- > > Attempt to dump current JCRs > > JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41 > > JobStatus=l use_count=1 > > JobType=B JobLevel=F > > sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 > > end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 > > db=(nil) db_batch=(nil) batch_started=0 > > JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04 > > JobStatus=R > > use_count=1 > > JobType=B JobLevel=I > > sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 > > end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 > > db=(nil) db_batch=(nil) batch_started=0 > > Attempt to dump plugins. Hook count=0 > > -- > > > > Both clients and server seem healthy, except for the SD crash. > > Any ideas? > > > > > > thanks! > > Stephen > > > > > > ------------------------------------------------------------------------- > >------------ Further info: > > > > My catalog... > > > > mysql-5.0.77 (64bit) MyISAM > > 210Gb in size > > 1,412,297,215 records in File table > > note: database built with bacula 2x scripts, > > upgraded with 3x scripts, then again with 5x scripts > > (i.e. nothing customized along the way) > > > > My OS& hardware for bacula DIR+SD server... > > > > Centos 5.4 (fully patched) > > 8Gb RAM > > 2Gb Swap > > 1Tb EXT3 filesystem on external fiber RAID5 array > > (dedicated to database, incl. temp files) > > 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs > > StorageTek SL500 Library with 2 LTO3 Drives > > > > > > > > > > > > ------------------------------------------------------------------------- > >----- Download Intel® Parallel Studio Eval > > Try the new software tools for yourself. Speed compiling, find bugs > > proactively, and fine-tune applications for parallel performance. > > See why Intel Parallel Studio got high marks during beta. > > http://p.sf.net/sfu/intel-sw-dev > > _______________________________________________ > > Bacula-devel mailing list > > bacula-de...@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/bacula-devel ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users