Hello, I'm using Bacula-5.0.2 on CentOS-5.5 x86_64.
Every month or so the storage daemon crashes and I need help to debug this problem. It crashed again this morning. What happened is that yesterday I setup a new backup job. The job was upgraded from Incremental to Full because it was the first time it ran and put in the queue until all other Incremental job finish. All Incremental jobs finished successfully and then Bacula found a suitable tape for the Full backup job and started the job. The FD on the client didn't authenticate the server and the job failed but the SD also crashed. In the systems logs I found only this line: ---- Jul 13 00:00:13 csebackup2 bacula-sd: Bacula interrupted by signal 11: Segmentation violation ---- The traceback file in the working directory has this: ---- ptrace: No such process. /data/bacula_working/16717: No such file or directory. $1 = 0 /opt/bacula-5.0.2/scripts/btraceback.gdb:2: Error in sourced command file: No symbol "exename" in current context. ---- Here is the log entry for the failed backup job: ---- 12-Jul 23:04 csebackup2.ucsd.edu-dir JobId 1010: No prior Full backup Job record found. 12-Jul 23:04 csebackup2.ucsd.edu-dir JobId 1010: No prior or suitable Full backup found in catalog. Doing FULL backup. 12-Jul 23:04 csebackup2.ucsd.edu-dir JobId 1010: Start Backup JobId 1010, Job=lilliput.2010-07-12_23.04.01_23 12-Jul 23:55 csebackup2.ucsd.edu-sd JobId 1010: 3307 Issuing autochanger "unload slot 8, drive 0" command. 12-Jul 23:59 csebackup2.ucsd.edu-dir JobId 1010: Max configured use duration exceeded. Marking Volume "CSE009L4" as Used. 12-Jul 23:59 csebackup2.ucsd.edu-dir JobId 1010: Recycled volume "CSE011L4" 12-Jul 23:59 csebackup2.ucsd.edu-dir JobId 1010: Using Volume "CSE011L4" from 'Scratch' pool. 12-Jul 23:59 csebackup2.ucsd.edu-dir JobId 1010: Using Device "Drive-1" 12-Jul 23:59 csebackup2.ucsd.edu-dir JobId 1010: Fatal error: Unable to authenticate with File daemon at "lilliput.ucsd.edu:9102". Possible causes: Passwords or names not the same or Maximum Concurrent Jobs exceeded on the FD or FD networking messed up (restart daemon). Please see http://www.bacula.org/en/rel-manual/Bacula_Freque_Asked_Questi.html#SECTION003760000000000000000 for help. 12-Jul 23:59 csebackup2.ucsd.edu-dir JobId 1010: Fatal error: Network error with FD during Backup: ERR=No data available 13-Jul 00:00 csebackup2.ucsd.edu-dir JobId 1010: Fatal error: No Job status returned from FD. 13-Jul 00:00 csebackup2.ucsd.edu-dir JobId 1010: Error: Bacula csebackup2.ucsd.edu-dir 5.0.2 (28Apr10): 13-Jul-2010 00:00:14 Build OS: x86_64-unknown-linux-gnu redhat JobId: 1010 Job: lilliput.2010-07-12_23.04.01_23 Backup Level: Full (upgraded from Incremental) Client: "lilliput.ucsd.edu-fd" FileSet: "lilliput-files" 2010-07-12 23:04:01 Pool: "FullTapes" (From Job FullPool override) Catalog: "MainCatalog" (From Client resource) Storage: "Tape" (From Job resource) Scheduled time: 12-Jul-2010 23:04:01 Start time: 12-Jul-2010 23:04:04 End time: 13-Jul-2010 00:00:14 Elapsed time: 56 mins 10 secs Priority: 10 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS: no Encryption: no Accurate: no Volume name(s): Volume Session Id: 20 Volume Session Time: 1278979090 Last Volume Bytes: 1 (1 B) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: Error SD termination status: Error Termination: *** Backup Error *** ---- The next Full backup job that started after that just said that it can't connect to the Storage Daemon: ---- 13-Jul 06:29 csebackup2.ucsd.edu-dir JobId 1012: Warning: bsock.c:129 Could not connect to Storage daemon on csebackup2.ucsd.edu:9103. ERR=Connection refused Retrying ... 13-Jul 06:33 csebackup2.ucsd.edu-dir JobId 1012: Fatal error: bsock.c:135 Unable to connect to Storage daemon on csebackup2.ucsd.edu:9103. ERR=Connection refused ---- Any idea where to look for the problem? Thanks Peter ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users