hello ich have build an new bacula-server to backup various linux-server so ... 29 server in 5 subnets got to be backed up on 26 everything works fine
the other 3 have a strange problem (there are many servers with nearly exact the same hardware & software - but most of them make no problems so, the can not be within the setup - i think ;-) ) the fd hangs on one file (its not always the same file - but often) and then ... after some time the job gets canceld sometimes with 'connection reset by peer' sometimes with 'broken pipe' i have tried to play with the heartbeat directive i have tried to downgrade the clients as well as the dir and the sd but nothing helps i've digging in the mailing list and found lots of entries with the same issue but no solution the fd-configuration are all generated by a shell-script so the only differ is the Name and tey are working on 26 clients very well here ist one example: FileDaemon { # this is me Name = "mds-srv1.tec.vcc.de" FDport = 9102 # where we listen for the director WorkingDirectory = /etc/bacula/working Pid Directory = /var/run Maximum Concurrent Jobs = 20 } Director { Name = bacula.tec.vcc.de Password = "xxxxxxxx" } Director { Name = bacula.tec.vcc.de-mon Password = "xxxxxxxx" Monitor = yes } Messages { Name = Standard director = bacula.tec.vcc.de = all, !skipped, !restored } here is some debug-output of the fd: mds-srv1.tec.vcc.de: backup.c:147 FT_REG saving: /usr/lib/locale/ko_KR.utf8/LC_COLLATE mds-srv1.tec.vcc.de: backup.c:225 bfiled: sending /usr/lib/locale/ko_KR.utf8/LC_COLLATE to stored mds-srv1.tec.vcc.de: backup.c:506 Send data to SD len=65536 mds-srv1.tec.vcc.de: backup.c:506 Send data to SD len=65536 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0 mds-srv1.tec.vcc.de: heartbeat.c:77 Got BNET_SIG 0 from SD mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=1 stop=1 mds-srv1.tec.vcc.de: backup.c:111 end blast_data ok=0 mds-srv1.tec.vcc.de: job.c:1266 Error in blast_data. mds-srv1.tec.vcc.de: job.c:1334 End FD msg: 2800 End Job TermCode=102 JobFiles=100347 ReadBytes=4068720695 JobBytes=4069737071 Errors=0 mds-srv1.tec.vcc.de: job.c:208 Quit command loop. Canceled=1 mds-srv1.tec.vcc.de: job.c:289 Calling term_find_files mds-srv1.tec.vcc.de: job.c:292 Done with term_find_files mds-srv1.tec.vcc.de: mem_pool.c:363 garbage collect memory pool mds-srv1.tec.vcc.de: job.c:294 Done with free_jcr here is one job-email: 16-Feb 18:08 mds-srv1.tec.vcc.de: Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06 Fatal error: backup.c:500 Network send error to SD. ERR=Die Wartezeit für die Verbindung ist abgelaufen (connection timeout) 16-Feb 18:12 bacula.tec.vcc.de JobId 116: Job Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06 marked to be canceled. 16-Feb 18:12 bacula.tec.vcc.de JobId 116: Fatal error: append.c:259 Network error on data channel. ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt (connection reset by peer) 16-Feb 18:12 bacula.tec.vcc.de JobId 116: Job write elapsed time = 00:41:57, Transfer rate = 1.599 M bytes/second 16-Feb 18:12 bacula.tec.vcc.de JobId 116: Fatal error: append.c:304 Fatal append error on device "FileStorage" (/Backup2Disk): ERR= 16-Feb 18:12 bacula.tec.vcc.de JobId 116: Error: bsock.c:444 Read error from client:192.168.100.51:36643: ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt (connection reset by peer) 16-Feb 18:12 bacula.tec.vcc.de JobId 116: Error: Bacula bacula.tec.vcc.de 2.2.8 (26Jan08): 16-Feb-2009 18:12:24 Build OS: i686-pc-linux-gnu redhat JobId: 116 Job: Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06 Backup Level: Full (upgraded from Incremental) Client: "mds-srv1.tec.vcc.de" x86_64-unknown-linux-gnu,suse,10 FileSet: "Linux-SystemFS" 2009-02-15 11:40:14 Pool: "ServerBackup" (From Job resource) Storage: "File" (From Job resource) Scheduled time: 16-Feb-2009 17:30:22 Start time: 16-Feb-2009 17:30:26 End time: 16-Feb-2009 18:12:24 Elapsed time: 41 mins 58 secs Priority: 10 FD Files Written: 100,347 SD Files Written: 100,334 FD Bytes Written: 4,069,737,071 (4.069 GB) SD Bytes Written: 4,024,995,720 (4.024 GB) Rate: 1616.3 KB/s Software Compression: None VSS: no Storage Encryption: no Volume name(s): B2D-File-0017 Volume Session Id: 3 Volume Session Time: 1234801299 Last Volume Bytes: 21,671,875,000 (21.67 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: Error SD termination status: Canceled Termination: *** Backup Error *** beause of the german localization of this server i translated the error-messages any hints? or wich information is needed to debug better? -- Wolfgang Jaede Doormannsweg 43 20259 Hamburg ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users