Hello, > Hi, > > 22.10.2007 21:26,, GDS.Marshall wrote:: >> version 2.2.4 patched from sourceforge >> Linux kernel 2.6.x >> >> I am running 10+ FD's, one SD, and one Director. I am having problems >> with one of my FD's, the others are fine. Not sure if it makes any >> difference, but the FD is on the same machine as the Director. > >> I have no issues with the network, I see no errors on either the >> interface >> of the FD or the SD. All FD's are plugged into the same netgear switch. >> The SD is plugged into a different netgear switch which is then plugged >> into the FD's switch. > > Are the FD and SD running on the same host (your description says that > DIR and problem FD are on the same machine, but not if the DIR and SD > are on that same machine, too)? No, the SD is on its own machine
FD+DIR FD FD | | | GSW---------------.... Gig Switch | FSW---------------.... Fast Switch | SD > >> I run a backup job (or via schedule) and the amount/size/volume of data >> is >> transfered each time, and then everything stops/hangs/does nothing. >> >> ls -l >> /var/data/bacula/spool/backupserver-sd.data.472.fileserver-backup.2007-10-22_18.54.33.DLT-V4.spool >> -rw-r----- 1 root bacula 2193816 Oct 22 18:56 >> >> A short while later, I will get a console message >> 22-Oct 18:56 backupserver-sd: 3301 Issuing autochanger "loaded? drive 0" >> command. >> 22-Oct 18:56 backupserver-sd: 3302 Autochanger "loaded? drive 0", result >> is Slot 3. >> 22-Oct 18:56 backupserver-sd: Volume "CNI906" previously written, moving >> to end of data. >> 22-Oct 18:56 backupserver-sd: Ready to append to end of Volume "CNI906" >> at >> file=1. >> 22-Oct 18:56 backupserver-sd: Spooling data ... >> 22-Oct 18:56 fileserver-fd: fileserver-backup.2007-10-22_18.54.33 Fatal >> error: backup.c:892 Network send error to SD. ERR=Success > > So the connection breaks shortly after data starts being transferred, > right? Correct, 2193816 is always written. > > It's a little bit surprising to see an error text of Success here... I > always thought that sort of things only happened on windows ;-) ROTFL. The FD, Dir, SD are on linux machines, we have not ventured to the Windows FD yet. > > >> I know it says "Network send error", however, I have checked the >> network, >> and can not find a problem with any of the equipment. > > Do you have a firewall running on that host? No firewalls running on any of the bacula hosts, and the switch is not a 3com. > >> I have run the fd and sd with debug options to provide additional >> output, >> I hope this helps. >> >> If any other information would help in diagnosis, please just ask for >> it. >> >> >> /usr/local/sbin/bacula-fd -f -s -d 200 -u root -g bacula -c >> /etc/bacula/bacula-fd.conf >> >> /home/spencer/bacula-sd -f -d 200 -s -u root -g bacula -c >> /etc/bacula/bacula-sd.conf >> >> cat /root/bacula-fd.log >> bacula-fd: filed_conf.c:438 Inserting director res: fileserver-mon >> fileserver-fd: jcr.c:132 read_last_jobs seek to 188 >> fileserver-fd: jcr.c:139 Read num_items=10 >> fileserver-fd: pythonlib.c:113 No script dir. prog=FDStartUp >> fileserver-fd: filed.c:225 filed: listening on port 9102 >> fileserver-fd: bnet_server.c:96 Addresses host[ipv4:0.0.0.0:9102] >> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387 >> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622 >> OnExit >> JobStatus=C set=C >> fileserver-fd: find.c:81 init_find_files ff=8094e60 >> fileserver-fd: job.c:233 <dird: Hello Director fileserver-dir calling >> fileserver-fd: job.c:249 Executing Hello command. >> fileserver-fd: job.c:353 Calling Authenticate >> fileserver-fd: cram-md5.c:71 send: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> fileserver-fd: cram-md5.c:150 sending resp to challenge: >> 6U+ZK4lCcB/uXh+k+X/qdB >> fileserver-fd: job.c:357 OK Authenticate >> fileserver-fd: job.c:233 <dird: JobId=0 >> Job=-Console-.2007-10-22_18.53.31 >> SDid=0 SDtime=0 Authorization=dummy >> fileserver-fd: job.c:249 Executing JobId= command. >> fileserver-fd: job.c:451 JobId=0 Auth=dummy >> fileserver-fd: job.c:233 <dird: statusfileserver-fd: job.c:249 Executing >> status command. >> fileserver-fd: runscript.c:102 runscript: running all RUNSCRIPT object >> (ClientAfterJob) JobStatus=C >> fileserver-fd: pythonlib.c:237 No startup module. >> fileserver-fd: job.c:337 Calling term_find_files >> fileserver-fd: job.c:340 Done with term_find_files >> fileserver-fd: mem_pool.c:377 garbage collect memory pool >> fileserver-fd: job.c:342 Done with free_jcr >> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387 >> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622 >> OnExit >> JobStatus=C set=C >> fileserver-fd: find.c:81 init_find_files ff=8094e60 >> fileserver-fd: job.c:233 <dird: Hello Director fileserver-dir calling >> fileserver-fd: job.c:249 Executing Hello command. >> fileserver-fd: job.c:353 Calling Authenticate >> fileserver-fd: cram-md5.c:71 send: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> fileserver-fd: cram-md5.c:150 sending resp to challenge: >> /2gP/C+Fx+ZhT98YS4+hzD >> fileserver-fd: job.c:357 OK Authenticate >> fileserver-fd: job.c:233 <dird: JobId=472 >> Job=fileserver-backup.2007-10-22_18.54.33 SDid=1 SDtime=1193079020 >> Authorization=JMKD-GKGP-LMNP-ODNP-EDME-NJBC-LEMK-BPGJ >> fileserver-fd: job.c:249 Executing JobId= command. >> fileserver-fd: job.c:451 JobId=472 >> Auth=JMKD-GKGP-LMNP-ODNP-EDME-NJBC-LEMK-BPGJ >> fileserver-fd: job.c:233 <dird: fileset vss=1 >> fileserver-fd: job.c:249 Executing fileset command. >> fileserver-fd: job.c:688 I >> fileserver-fd: job.c:688 O M0 >> fileserver-fd: job.c:688 N >> fileserver-fd: job.c:688 F /boot >> fileserver-fd: job.c:688 F /etc >> fileserver-fd: job.c:688 F /usr/local >> fileserver-fd: job.c:688 F /var/lib >> fileserver-fd: job.c:688 F /var/data/svn >> fileserver-fd: job.c:688 F /var/data/mysql-bak >> fileserver-fd: job.c:688 F /var/spool/cyrus >> fileserver-fd: job.c:688 F /usr/src >> fileserver-fd: job.c:688 F /var/mail >> fileserver-fd: job.c:688 F /home/zak/mail >> fileserver-fd: job.c:688 F /home/james/mail >> fileserver-fd: job.c:688 F /home/dpar2/mail >> fileserver-fd: job.c:688 N >> fileserver-fd: job.c:688 E >> fileserver-fd: job.c:688 F /tmp >> fileserver-fd: job.c:688 N >> fileserver-fd: job.c:233 <dird: level = incremental mtime_only=0 >> fileserver-fd: job.c:249 Executing level = command. >> fileserver-fd: job.c:1160 level_cmd: level = incremental mtime_only=0 >> fileserver-fd: job.c:233 <dird: level = since_utime 1192958180 >> mtime_only=0 >> fileserver-fd: job.c:249 Executing level = command. >> fileserver-fd: job.c:1160 level_cmd: level = since_utime 1192958180 >> mtime_only=0 >> fileserver-fd: job.c:1194 since_time=1192958180 >> fileserver-fd: job.c:1215 Dirtime=1193079276315533 >> FDtime=1193079276315510 >> fileserver-fd: job.c:1217 rt=56 adj=18446744073709551565 >> fileserver-fd: job.c:1215 Dirtime=1193079276315628 >> FDtime=1193079276315603 >> fileserver-fd: job.c:1217 rt=59 adj=18446744073709551511 >> fileserver-fd: job.c:1215 Dirtime=1193079276315776 >> FDtime=1193079276315747 >> fileserver-fd: job.c:1217 rt=64 adj=18446744073709551450 >> fileserver-fd: job.c:1215 Dirtime=1193079276315872 >> FDtime=1193079276315847 >> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551396 >> fileserver-fd: job.c:1215 Dirtime=1193079276315964 >> FDtime=1193079276315940 >> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551343 >> fileserver-fd: job.c:1215 Dirtime=1193079276316057 >> FDtime=1193079276316032 >> fileserver-fd: job.c:1217 rt=59 adj=18446744073709551289 >> fileserver-fd: job.c:1215 Dirtime=1193079276316149 >> FDtime=1193079276316125 >> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551236 >> fileserver-fd: job.c:1215 Dirtime=1193079276316242 >> FDtime=1193079276316218 >> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551183 >> fileserver-fd: job.c:1221 rt=58 adj=18446744073709551562 >> fileserver-fd: job.c:1236 adj = 0 since_time=1192958180 >> fileserver-fd: job.c:233 <dird: storage >> address=backupserver.cluster.local >> port=9103 ssl=0 >> fileserver-fd: job.c:249 Executing storage command. >> fileserver-fd: job.c:1291 StorageCmd: storage >> address=backupserver.cluster.local port=9103 ssl=0 >> fileserver-fd: job.c:1297 Open storage: backupserver.cluster.local:9103 >> ssl=0 >> fileserver-fd: bsock.c:195 Current host[ipv4:192.168.1.135:9103] All >> host[ipv4:192.168.1.135:9103] >> fileserver-fd: bsock.c:149 who=Storage daemon >> host=backupserver.cluster.local port=9103 >> fileserver-fd: job.c:1309 Connection OK to SD. >> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> fileserver-fd: cram-md5.c:150 sending resp to challenge: >> XB+VsUQJ44JUUT+eURwxEB >> fileserver-fd: cram-md5.c:78 send: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> fileserver-fd: cram-md5.c:97 Authenticate OK 94ldXX/Ic5FIJUAeTm/qKA >> fileserver-fd: job.c:1318 Authenticated with SD. >> fileserver-fd: job.c:233 <dird: RunBeforeJob >> /etc/bacula/scripts/makemysqlbackup >> fileserver-fd: job.c:249 Executing RunBeforeJob command. >> fileserver-fd: job.c:462 runbefore_cmd: RunBeforeJob >> /etc/bacula/scripts/makemysqlbackup >> fileserver-fd: runscript.c:204 runscript: running a RUNSCRIPT object >> fileserver-fd: util.c:580 edit_job_codes: >> /etc/bacula/scripts/makemysqlbackup >> fileserver-fd: runscript.c:211 runscript: running >> '/etc/bacula/scripts/makemysqlbackup'... >> fileserver-fd: runscript.c:236 runscript OK >> fileserver-fd: job.c:233 <dird: backup >> fileserver-fd: job.c:249 Executing backup command. >> fileserver-fd: jcr.c:602 OnEntry JobStatus=C set=B >> fileserver-fd: jcr.c:622 OnExit JobStatus=B set=B >> fileserver-fd: job.c:1350 begin backup ff=8094e60 >> fileserver-fd: job.c:1358 bfiled>dird: 2000 OK backup >> fileserver-fd: job.c:1364 >stored: append open session >> fileserver-fd: job.c:1369 <stored: 3000 OK open ticket = 1 >> fileserver-fd: job.c:1374 Got Ticket=1 >> fileserver-fd: job.c:1384 >stored: append data 1 >> fileserver-fd: job.c:1389 <stored: append data 1 >> fileserver-fd: job.c:1798 3000 OK data >> fileserver-fd: pythonlib.c:237 No startup module. >> fileserver-fd: job.c:1436 begin blast ff=8094e60 >> fileserver-fd: jcr.c:602 OnEntry JobStatus=B set=R >> fileserver-fd: jcr.c:622 OnExit JobStatus=R set=R >> fileserver-fd: find.c:93 Enter set_find_options() >> fileserver-fd: find.c:96 Leave set_find_options() >> fileserver-fd: find.c:198 F /boot >> fileserver-fd: backup.c:278 FT_DIREND: /boot/lost+found/ >> fileserver-fd: backup.c:332 bfiled: sending /boot/lost+found to stored >> fileserver-fd: backup.c:1137 No strip for /boot/lost+found >> fileserver-fd: backup.c:245 FT_REG saving: /boot/map >> fileserver-fd: backup.c:332 bfiled: sending /boot/map to stored >> fileserver-fd: backup.c:1137 No strip for /boot/map >> fileserver-fd: backup.c:895 Send data to SD len=80384 >> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.4.30 >> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.4.30 to >> stored >> fileserver-fd: backup.c:1137 No strip for /boot/config-2.4.30 >> fileserver-fd: backup.c:895 Send data to SD len=23184 >> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.4.32 >> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.4.32 to >> stored >> fileserver-fd: backup.c:1137 No strip for /boot/config-2.4.32 >> fileserver-fd: backup.c:895 Send data to SD len=23938 >> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.4.30 >> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.4.30 to >> stored >> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.4.30 >> fileserver-fd: backup.c:895 Send data to SD len=647691 >> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.4.32 >> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.4.32 to >> stored >> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.4.32 >> fileserver-fd: backup.c:895 Send data to SD len=647760 >> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.6.16.1-2 >> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.6.16.1-2 >> to >> stored >> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.6.16.1-2 >> fileserver-fd: backup.c:895 Send data to SD len=742070 >> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.6.16.1-2 >> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.6.16.1-2 to >> stored >> fileserver-fd: backup.c:1137 No strip for /boot/config-2.6.16.1-2 >> fileserver-fd: backup.c:895 Send data to SD len=31119 >> fileserver-fd: backup.c:245 FT_REG saving: /boot/vmlinuz-2.6.16.1-2 >> fileserver-fd: backup.c:332 bfiled: sending /boot/vmlinuz-2.6.16.1-2 to >> stored >> fileserver-fd: backup.c:1137 No strip for /boot/vmlinuz-2.6.16.1-2 >> fileserver-fd: jcr.c:602 OnEntry JobStatus=R set=f >> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=f >> fileserver-fd: jcr.c:602 OnEntry JobStatus=f set=E >> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=E >> fileserver-fd: backup.c:197 end blast_data ok=0 > > Here it's failed, I think. A higher debug level might reveal more, but > this doesn't tell me anything important. I am probably going to get flamed for this, but what value, currently it is set to 200, I do not want to put it too high, and swamp the amount of data I am supplying the mailing list, but neither do I want to waste the mailing lists time by making it too low.... > >> fileserver-fd: jcr.c:602 OnEntry JobStatus=f set=E >> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=E >> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387 >> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622 >> OnExit >> JobStatus=C set=C >> fileserver-fd: find.c:81 init_find_files ff=8097838 > ... > >> >> cat /root/bacula-sd.log > ... >> backupserver-sd: jcr.c:602 OnEntry JobStatus=R set=R >> backupserver-sd: jcr.c:622 OnExit JobStatus=R set=R >> backupserver-sd: append.c:96 Begin append device="DLT-V4" (/dev/nst0) >> backupserver-sd: spool.c:110 Turning on data spooling >> backupserver-sd: spool.c:179 Created spool file: >> /var/data/amanda/bacula/spool/backupserver-sd.data.472.fileserver-backup.2007-10-22_18.54.33.DLT-V4.spool >> backupserver-sd: append.c:101 Just after acquire_device_for_append >> backupserver-sd: label.c:698 session_label record=80cc220 >> backupserver-sd: label.c:754 Write sesson_label record JobId=472 >> FI=SOS_LABEL SessId=1 Strm=472 len=179 remainder=0 >> backupserver-sd: label.c:758 Leave write_session_label Block=0d File=1d >> backupserver-sd: bnet.c:666 who=client host=192.168.1.30 port=36643 > > The session has started, it's label is written to tape. > >> backupserver-sd: dircmd.c:171 Conn: Hello Director fileserver-dir >> calling >> backupserver-sd: dircmd.c:181 Got a DIR connection >> backupserver-sd: jcr.c:602 OnEntry JobStatus=backupserver-sd: jcr.c:622 >> OnExit JobStatus=C set=C >> backupserver-sd: cram-md5.c:71 send: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> backupserver-sd: cram-md5.c:131 cram-get: auth cram-md5 >> <[EMAIL PROTECTED]> ssl=0 >> backupserver-sd: cram-md5.c:150 sending resp to challenge: >> p8/qo/5jVDAXL//8+/4bbD >> backupserver-sd: dircmd.c:202 Message channel init completed. >> backupserver-sd: dircmd.c:209 <dird: status >> backupserver-sd: dircmd.c:223 Do command: status >> backupserver-sd: pythonlib.c:237 No startup module. >> backupserver-sd: mem_pool.c:377 garbage collect memory pool > > No more output related to the actual job, I think. Again, a higher > debug level might reveal something. > > >> backupserver ~ # > > With the information from above, I suspect a network problem. Does the > client run before job you have run for a very long time? In such a > situation, a firewall/router might close the connection between SD and > FD because it seems to be idle. The run before job might take half an hour max. There is no firewall or router in the setup. > > Arno Many thanks Spencer ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users