On Thursday 15 April 2010 22:16:46 Stephen Thompson wrote: > Hello, > > Thanks for the response. > > No, it's nothing to do with mail configuration; 100% sure of that. > (I know people say that all the time, but, seriously, it's the director). > > And by alerts, I do mean "Messages" in the bacula vernacular. > > The first time this crash happened, we received 120,000 Messages in the > form of emails to our administrative account. The messages were > identical both to each other and to the content of the $JOB.mail file in > our bacula working directory (which is never removed automatically after > one of these crashes - perhaps that causes the endless cycle). The same > Message also appears to be written to our bacula log file each time an > email is generated (or vice versa). > > It seems to me like it's possible for the director to get stuck in a > loop and send the contents of that mail file again and again, > infinitely. Both times we've had the SD crash (both have happened since > upgrading to 5.0.1), the only thing that stopped the Message generation > was stopping the director itself. > > Of course, that's the annoying symptom. The more serious problem is our > the crash of our SD. Any pointers to getting "ptrace" working with the > automatic scripts? > 1. Make sure the binaries are compiled with the -g option and
2. Run the Director as root or 3. Reacquire root permision in the traceback script or 4. Run the Director under the debugger manually Test by sending a SIGILL or SIGSEGV to the Director. Kern > thanks! > Stephen > > On 04/15/2010 12:40 PM, Kern Sibbald wrote: > > On Thursday 15 April 2010 19:36:51 Stephen Thompson wrote: > >> Additionally, seems like the SD was possibly reading a new > >> freshly-labeled tape when it crashed... Last items in bacula log > >> besides alerts already mentioned: > > > > In Bacula "alerts" refer to tape drive information stored concerning tape > > problems, so I am assuming you mean messages. > > > >> 15-Apr 09:31 server-sd JobId 100000: Writing spooled data to Volume. > >> Despooling 35,000,185,219 bytes ... > >> 15-Apr 09:51 server-sd JobId 100000: End of Volume "FB0568" at 888:1414 > >> on device "SL500-Drive-1" (/dev/nst0). Write of 262144 bytes got -1. > >> 15-Apr 09:51 server-sd JobId 100000: Re-read of last block succeeded. > >> 15-Apr 09:51 server-sd JobId 100000: End of medium on Volume "FB0568" > >> Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51. > >> 15-Apr 09:51 server-sd JobId 100000: 3307 Issuing autochanger "unload > >> slot 38, drive 1" command. > >> 15-Apr 09:52 server-sd JobId 100000: 3301 Issuing autochanger "loaded? > >> drive 1" command. > >> 15-Apr 09:52 server-sd JobId 100000: 3302 Autochanger "loaded? drive 1", > >> result: nothing loaded. > >> 15-Apr 09:52 server-sd JobId 100000: 3304 Issuing autochanger "load slot > >> 39, drive 1" command. > >> 15-Apr 09:52 server-sd JobId 100000: 3305 Autochanger "load slot 39, > >> drive 1", status is OK. > >> 15-Apr 09:52 server-sd JobId 100000: Volume "FB0569" previously written, > >> moving to end of data. > >> > >> Nothing but thousands of 'repetitive' alerts after that... > > > > What exactly is repeated? > > > > There was a Bacula bug #1480 in message delivery that may be the same > > that you are experiencing, it was triggered by a misconfigured SMTP > > server or by a reference in Bacula to a non-existent SMTP server - and > > the simple solution is to make sure Bacula points to a valid functional > > SMTP server. This problem was not particular to version 5.0.1, but I > > think it was fixed after the release of 5.0.1. Please see the bugs > > database for more details. > > > > Kern > > > >> thanks again, > >> Stephen > >> > >> On 04/15/2010 10:25 AM, Stephen Thompson wrote: > >>> Hello, > >>> > >>> I have just now experienced a possible new bug with bacula 5.0.1. > >>> > >>> The symptoms are this: > >>> > >>> bacula-sd crashes > >>> bacula-dir continues to run > >>> bacula-dir then spews out identical "Intervention needed" emails until > >>> manually restarted > >>> > >>> The first time this happened over a weekend and upon returning I found > >>> my inbox has about 120,000 bacula emails, all the SAME and of this > >>> type: > >>> > >>> "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 > >>> Network send error to SD. ERR=Broken pipe" > >>> > >>> It happened again just now (second time since upgrading from 3.0.3 to > >>> 5.0.1) and I managed to stop the director with only a few thousand > >>> emails going out. > >>> > >>> So there are really 2 issues here: > >>> > >>> 1) > >>> Why does the director apparently get stuck in an infinite loop of > >>> sending the same email message? Is this a known bug? > >>> > >>> 2) > >>> Regarding the SD, I received one alert of this type, the rest like the > >>> above: > >>> > >>> "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: > >>> dev->blocked()" > >>> > >>> A traceback like: > >>> -- > >>> ptrace: Operation not permitted. > >>> /var/bacula/work/29091: No such file or directory. > >>> $1 = 0 > >>> /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command > >>> file: No symbol "exename" in current context. > >>> -- > >>> > >>> And a bactrace like: > >>> -- > >>> Attempt to dump current JCRs > >>> JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41 > >>> JobStatus=l use_count=1 > >>> JobType=B JobLevel=F > >>> sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 > >>> end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 > >>> db=(nil) db_batch=(nil) batch_started=0 > >>> JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04 > >>> JobStatus=R > >>> use_count=1 > >>> JobType=B JobLevel=I > >>> sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 > >>> end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 > >>> db=(nil) db_batch=(nil) batch_started=0 > >>> Attempt to dump plugins. Hook count=0 > >>> -- > >>> > >>> Both clients and server seem healthy, except for the SD crash. > >>> Any ideas? > >>> > >>> > >>> thanks! > >>> Stephen > >>> > >>> > >>> ----------------------------------------------------------------------- > >>>-- ------------ Further info: > >>> > >>> My catalog... > >>> > >>> mysql-5.0.77 (64bit) MyISAM > >>> 210Gb in size > >>> 1,412,297,215 records in File table > >>> note: database built with bacula 2x scripts, > >>> upgraded with 3x scripts, then again with 5x scripts > >>> (i.e. nothing customized along the way) > >>> > >>> My OS& hardware for bacula DIR+SD server... > >>> > >>> Centos 5.4 (fully patched) > >>> 8Gb RAM > >>> 2Gb Swap > >>> 1Tb EXT3 filesystem on external fiber RAID5 array > >>> (dedicated to database, incl. temp files) > >>> 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs > >>> StorageTek SL500 Library with 2 LTO3 Drives > >>> > >>> > >>> > >>> > >>> > >>> ----------------------------------------------------------------------- > >>>-- ----- Download Intel® Parallel Studio Eval > >>> Try the new software tools for yourself. Speed compiling, find bugs > >>> proactively, and fine-tune applications for parallel performance. > >>> See why Intel Parallel Studio got high marks during beta. > >>> http://p.sf.net/sfu/intel-sw-dev > >>> _______________________________________________ > >>> Bacula-devel mailing list > >>> bacula-de...@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/bacula-devel ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users