Since about my third post on the subject, the "primary test client" has been Fedore Core 4. The server is Fedora Core 5. I decided to remove the Win32 variable early on. I'm digging into the R8169 adapter on the server to see if there are firmware issues on that card. I found the latest firmware and download utility on their web site cleverly hidden under DOS.
Since the weekend is coming up and I can spare a few minutes of down time, I may attempt to move the tape server to another machine. I'll probably reflash the firmware tomorrow, test once, then move the tapes to another server if it fails. bbaker >On Friday 15 September 2006 21:36, William Baker wrote: > > >>I found the documentation on the heartbeat, configured it for the FD and >>SD for 5 sec, restarted the deamons, and ran the test again. On the >>primary test machine, the backup is still dying in the same place. I >>did notice (a little late) that I was probably focusing on the wrong >>message. >> >>The clients and server are seperated by a couple of switches, but they >>are on the same subnets, so routers should not be an issue. Most >>devices are gigabit on managed switches. Some devices are 100MB. In >>particular, the server is gigabit and the primary test client is 100MB. >>I plan to trace the route and check the errors on the ports -- starting >>with the server. >> >>For my primary test machine, the point of failure is consistantly around >>5 mins into the backup with 2.460 to 2.464 G transferred. >> >> > >If it happens that quickly and at 2.xx G, then it is most likely a Windows >problem (see the Win32 chapter of the manual for a weird case), or a bad >ethernet card (probably bad firmware). > > > >>bbaker >> >> >> >>>On Friday 15 September 2006 18:07, William Baker wrote: >>> >>> >>> >>> >>>>(Thanks for kindly pointing me in the right direction, Kern.) >>>> >>>>I have a little bit more info to add to the mix -- and a little more >>>>confusion. Unix clients are behaving the same way. So, the only thing >>>>all these items appear to have in common is the server -- though it >>>>would seem strange to me to have such a problem in a production server >>>>that has been in use in other places for months. >>>> >>>>So, I upgraded the server to the latest beta. Surprise: same thing >>>>still happened -- "packet size too big". Well. The server is fedora >>>>core 4 with up-to-date patches. gcc version 4.0.2. I also failed to >>>>mention the server is build-from-source due to a strict mysql version >>>>4.1.10 requirement. The clients are RPM's and EXE's. >>>> >>>>I guess now is the time to dig into the code. At least I have a few >>>>verbose error messages to point the way. >>>> >>>> >>>> >>>> >>>The problem you are having doesn't appear to be packet size too big because >>>that was not the first error message, and is likely spurious due to the >>>disconnection. >>> >>>I suspect that you are seeing network problems -- either a bad switch, a >>> >>> >bad > > >>>ethernet card, or simply Windows software that doesn't follow Internet >>> >>> >rules > > >>>and times out the line during large transfers. The manual discusses >>> >>> >several > > >>>reasons for this, including in some cases a Bacula workaround called >>>Heartbeat Interval. >>> >>> >>> >>> >>> >>>>bbaker >>>> >>>> >>>> >>>> >>>> >>>>>You will probably have better luck getting your question answered on the >>>>>bacula-users list, which I have copied for you. >>>>> >>>>>On Friday 15 September 2006 15:36, William Baker wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I know "packet too long" is in the FAQ. I think this is a new but >>>>>>related issue. The error is consistant and repeatable. >>>>>> >>>>>>The server is a production version bacula 1.38.11 running on Linux with >>>>>>MySQL database. Two versions of the Windows client have been tested: >>>>>>1.38.10 and 1.39.22. Several configurations of the client have been >>>>>>tested, but with and without VSS enabled. I have a TODO list that >>>>>>includes backing up other (non-windows) clients, but those tests haven't >>>>>>been done yet. The traces included below are for 1.39.22. >>>>>> >>>>>>The client data to backup is approximately 21 GB. For v1.38.10, only >>>>>>about 2GB where actually backed up. For 1.39.22 about 20GB were >>>>>>retrieved from the client before, the following message appears: >>>>>> >>>>>>15-Sep 07:36 scott2-sd: mcleod-job.2006-09-15_07.17.39 Fatal error: >>>>>>append.c:144 Error reading data header from FD. ERR=No data available >>>>>>15-Sep 07:36 scott2-sd: mcleod-job.2006-09-15_07.17.39 Fatal error: >>>>>>bnet.c:228 Packet size too big from "client:192.168.4.20:36643. >>>>>>Terminating connection. >>>>>>15-Sep 07:36 mcleod-fd: mcleod-job.2006-09-15_07.17.39 Fatal error: >>>>>>../../filed/backup.c:787 Network send error to SD. ERR=Input/output >>>>>> >>>>>> >error > > >>>>>>15-Sep 07:36 mcleod-fd: mcleod-job.2006-09-15_07.17.39 Error: >>>>>>../../lib/bnet.c:393 Write error sending len to Storage >>>>>>daemon:proe.priefert.com:9103: ERR=Input/output error >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "System Writer", >>>>>>State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "MSDEWriter", >>>>>>State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "IIS Metabase >>>>>>Writer", State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "Removable Storage >>>>>>Manager", State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "WMI Writer", >>>>>>State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "Event Log Writer", >>>>>>State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "Registry Writer", >>>>>>State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "COM+ REGDB >>>>>>Writer", State: 0x1 (VSS_WS_STABLE) >>>>>>15-Sep 07:38 scott2-dir: mcleod-job.2006-09-15_07.17.39 Error: Bacula >>>>>>1.38.11 (28Jun06): 15-Sep-2006 07:38:08 >>>>>> >>>>>>On the client, the last few lines of the bacula.trace file tell a >>>>>>similar story: >>>>>> >>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path >>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK >>>>>>Technologies\PMW190\Connect\PCMSRV.HLP >>>>>>mcleod-fd: ../compat/compat.cpp:90 Enter convert_unix_to_win32_path >>>>>>mcleod-fd: ../compat/compat.cpp:141 path=D:\Program Files\ALK >>>>>>Technologies\PMW190\Connect\PCMSRV.HLP >>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path >>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK >>>>>>Technologies\PMW190\Connect\PCMSRV.HLP >>>>>>mcleod-fd: ../compat/compat.cpp:1107 readdir_r(b64960, { >>>>>>d_name="pcmsrv.pdf", d_reclen=10, d_off=66 >>>>>>mcleod-fd: ../compat/compat.cpp:177 Enter wchar_win32_path >>>>>>mcleod-fd: ../compat/compat.cpp:351 Leave wchar_win32_path=\ >>>>>>mcleod-fd: ../compat/compat.cpp:90 Enter convert_unix_to_win32_path >>>>>>mcleod-fd: ../compat/compat.cpp:141 path=D:\Program Files\ALK >>>>>>Technologies\PMW190\Connect\pcmsrv.pdf >>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path >>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK >>>>>>Technologies\PMW190\Connect\pcmsrv.pdf >>>>>>mcleod-fd: ../compat/compat.cpp:90 Enter convert_unix_to_win32_path >>>>>>mcleod-fd: ../compat/compat.cpp:141 path=D:\Program Files\ALK >>>>>>Technologies\PMW190\Connect\pcmsrv.pdf >>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path >>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK >>>>>>Technologies\PMW190\Connect\pcmsrv.pdf >>>>>>mcleod-fd: ../../filed/heartbeat.c:77 Got BNET_SIG 0 from SD >>>>>>mcleod-fd: ../../filed/heartbeat.c:82 wait_intr=1 stop=1 >>>>>>mcleod-fd: ../../filed/backup.c:184 end blast_data ok=0 >>>>>>mcleod-fd: ../../filed/job.c:221 Quit command loop. Canceled=1 >>>>>>mcleod-fd: ../../filed/job.c:303 Calling term_find_files >>>>>>mcleod-fd: ../../filed/job.c:306 Done with term_find_files >>>>>>mcleod-fd: ../../filed/job.c:308 Done with free_jcr >>>>>> >>>>>>Actually, on the Windows box, I'm trying to back up most of C: and D:. >>>>>>Here is what cygwin df says about the data: >>>>>> >>>>>>C:\WINDOWS\system32>df >>>>>>Filesystem 1K-blocks Used Available Use% Mounted on >>>>>>C:\cygwin\bin 20482843 9327624 11155219 46% /usr/bin >>>>>>C:\cygwin\lib 20482843 9327624 11155219 46% /usr/lib >>>>>>C:\cygwin 20482843 9327624 11155219 46% / >>>>>>c: 20482843 9327624 11155219 46% /cygdrive/c >>>>>>d: 123170320 12428172 110742148 11% /cygdrive/d >>>>>> >>>>>>While the backup statistics give the following: >>>>>> >>>>>>JobId: 8 >>>>>>Job: mcleod-job.2006-09-15_07.17.39 >>>>>>Backup Level: Full >>>>>>Client: "mcleod-fd" Linux,Cross-compile,Win32 >>>>>>FileSet: "BasicWindowsFileSet" 2006-09-14 21:26:55 >>>>>>Pool: "Default" >>>>>>Storage: "LTO2" >>>>>>Scheduled time: 15-Sep-2006 07:17:36 >>>>>>Start time: 15-Sep-2006 07:17:44 >>>>>>End time: 15-Sep-2006 07:38:08 >>>>>>Elapsed time: 20 mins 24 secs >>>>>>Priority: 1 >>>>>>FD Files Written: 73,064 >>>>>>SD Files Written: 72,744 >>>>>>FD Bytes Written: 20,028,562,878 (20.02 GB) >>>>>>SD Bytes Written: 20,037,565,140 (20.03 GB) >>>>>>Rate: 16363.2 KB/s >>>>>>Software Compression: None >>>>>>Volume name(s): bacula-1 >>>>>>Volume Session Id: 7 >>>>>> >>>>>>So, the immense majority of the data was sent. I don't yet know enought >>>>>>about bacula to know if the difference between the FD Files Written and >>>>>>SD Files Written is any kind of clue. >>>>>> >>>>>>By the way, the beta on Windows looks very promising. I liked what I >>>>>>saw. I worked a little with BartPE to build a bootable recovery CD. I >>>>>>know the issues associated with that. Can you post off-topic in your >>>>>>own post? >>>>>> >>>>>>bbaker >>>>>> >>>>>> >>>>>> >>>>>> >>>>>------------------------------------------------------------------------- >>>>> >>>>> >>>>>>Using Tomcat but need to do more? Need to support web services, >>>>>> >>>>>> >security? > > >>>>>>Get stuff done quickly with pre-integrated technology to make your job >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>easier >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache >>>>>> >>>>>> >Geronimo > > >>>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>>_______________________________________________ >>>>>>Bacula-devel mailing list >>>>>>[EMAIL PROTECTED] >>>>>>https://lists.sourceforge.net/lists/listinfo/bacula-devel >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>------------------------------------------------------------------------- >>>>Using Tomcat but need to do more? Need to support web services, security? >>>>Get stuff done quickly with pre-integrated technology to make your job >>>> >>>> >>>> >>>> >>>easier >>> >>> >>> >>> >>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>_______________________________________________ >>>>Bacula-users mailing list >>>>Bacula-users@lists.sourceforge.net >>>>https://lists.sourceforge.net/lists/listinfo/bacula-users >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users