Since about my third post on the subject, the "primary test client" has 
been Fedore Core 4.  The server is Fedora Core 5.  I decided to remove 
the Win32 variable early on.  I'm digging into the R8169 adapter on the 
server to see if there are firmware issues on that card.  I found the 
latest firmware and download utility on their web site cleverly hidden 
under DOS.

Since the weekend is coming up and I can spare a few minutes of down 
time, I may attempt to move the tape server to another machine.  I'll 
probably reflash the firmware tomorrow, test once, then move the tapes 
to another server if it fails.

bbaker

>On Friday 15 September 2006 21:36, William Baker wrote:
>  
>
>>I found the documentation on the heartbeat, configured it for the FD and 
>>SD for 5 sec, restarted the deamons, and ran the test again.  On the 
>>primary test machine, the backup is still dying in the same place.  I 
>>did notice (a little late) that I was probably focusing on the wrong 
>>message.
>>
>>The clients and server are seperated by a couple of switches, but they 
>>are on the same subnets, so routers should not be an issue.  Most 
>>devices are gigabit on managed switches.  Some devices are 100MB.  In 
>>particular, the server is gigabit and the primary test client is 100MB.  
>>I plan to trace the route and check the errors on the ports -- starting 
>>with the server.
>>
>>For my primary test machine, the point of failure is consistantly around 
>>5 mins into the backup with 2.460 to 2.464 G transferred.
>>    
>>
>
>If it happens that quickly and at 2.xx G, then it is most likely a Windows 
>problem (see the Win32 chapter of the manual for a weird case), or a bad 
>ethernet card (probably bad firmware).
>
>  
>
>>bbaker
>>
>>    
>>
>>>On Friday 15 September 2006 18:07, William Baker wrote:
>>> 
>>>
>>>      
>>>
>>>>(Thanks for kindly pointing me in the right direction, Kern.)
>>>>
>>>>I have a little bit more info to add to the mix -- and a little more 
>>>>confusion.  Unix clients are behaving the same way.  So, the only thing 
>>>>all these items appear to have in common is the server -- though it 
>>>>would seem strange to me to have such a problem in a production server 
>>>>that has been in use in other places for months. 
>>>>
>>>>So, I upgraded the server to the latest beta.  Surprise: same thing 
>>>>still happened -- "packet size too big".  Well.  The server is fedora 
>>>>core 4 with up-to-date patches.  gcc version 4.0.2.  I also failed to 
>>>>mention the server is build-from-source due to a strict mysql version 
>>>>4.1.10 requirement.  The clients are RPM's and EXE's.
>>>>
>>>>I guess now is the time to dig into the code.  At least I have a few 
>>>>verbose error messages to point the way.
>>>>   
>>>>
>>>>        
>>>>
>>>The problem you are having doesn't appear to be packet size too big because 
>>>that was not the first error message, and is likely spurious due to the 
>>>disconnection.
>>>
>>>I suspect that you are seeing network problems -- either a bad switch, a 
>>>      
>>>
>bad 
>  
>
>>>ethernet card, or simply Windows software that doesn't follow Internet 
>>>      
>>>
>rules 
>  
>
>>>and times out the line during large transfers.  The manual discusses 
>>>      
>>>
>several 
>  
>
>>>reasons for this, including in some cases a Bacula workaround called 
>>>Heartbeat Interval.
>>>
>>> 
>>>
>>>      
>>>
>>>>bbaker
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>You will probably have better luck getting your question answered on the 
>>>>>bacula-users list, which I have copied for you.
>>>>>
>>>>>On Friday 15 September 2006 15:36, William Baker wrote:
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>I know "packet too long" is in the FAQ.  I think this is a new but 
>>>>>>related issue.  The error is consistant and repeatable.
>>>>>>
>>>>>>The server is a production version bacula 1.38.11 running on Linux with 
>>>>>>MySQL database.  Two versions of the Windows client have been tested: 
>>>>>>1.38.10 and 1.39.22.  Several configurations of the client have been 
>>>>>>tested, but with and without VSS enabled.  I have a TODO list that 
>>>>>>includes backing up other (non-windows) clients, but those tests haven't 
>>>>>>been done yet.  The traces included below are for 1.39.22. 
>>>>>>
>>>>>>The client data to backup is approximately 21 GB.  For v1.38.10, only 
>>>>>>about 2GB where actually backed up.   For 1.39.22 about 20GB were 
>>>>>>retrieved from the client before, the following message appears:
>>>>>>
>>>>>>15-Sep 07:36 scott2-sd: mcleod-job.2006-09-15_07.17.39 Fatal error: 
>>>>>>append.c:144 Error reading data header from FD. ERR=No data available
>>>>>>15-Sep 07:36 scott2-sd: mcleod-job.2006-09-15_07.17.39 Fatal error: 
>>>>>>bnet.c:228 Packet size too big from "client:192.168.4.20:36643. 
>>>>>>Terminating connection.
>>>>>>15-Sep 07:36 mcleod-fd: mcleod-job.2006-09-15_07.17.39 Fatal error: 
>>>>>>../../filed/backup.c:787 Network send error to SD. ERR=Input/output 
>>>>>>            
>>>>>>
>error
>  
>
>>>>>>15-Sep 07:36 mcleod-fd: mcleod-job.2006-09-15_07.17.39 Error: 
>>>>>>../../lib/bnet.c:393 Write error sending len to Storage 
>>>>>>daemon:proe.priefert.com:9103: ERR=Input/output error
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "System Writer", 
>>>>>>State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "MSDEWriter", 
>>>>>>State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "IIS Metabase 
>>>>>>Writer", State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "Removable Storage 
>>>>>>Manager", State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "WMI Writer", 
>>>>>>State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "Event Log Writer", 
>>>>>>State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "Registry Writer", 
>>>>>>State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 mcleod-fd: VSS Writer (BackupComplete): "COM+ REGDB 
>>>>>>Writer", State: 0x1 (VSS_WS_STABLE)
>>>>>>15-Sep 07:38 scott2-dir: mcleod-job.2006-09-15_07.17.39 Error: Bacula 
>>>>>>1.38.11 (28Jun06): 15-Sep-2006 07:38:08
>>>>>>
>>>>>>On the client, the last few lines of the bacula.trace file tell a 
>>>>>>similar story:
>>>>>>
>>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path 
>>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK 
>>>>>>Technologies\PMW190\Connect\PCMSRV.HLP
>>>>>>mcleod-fd: ../compat/compat.cpp:90 Enter convert_unix_to_win32_path
>>>>>>mcleod-fd: ../compat/compat.cpp:141 path=D:\Program Files\ALK 
>>>>>>Technologies\PMW190\Connect\PCMSRV.HLP
>>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path 
>>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK 
>>>>>>Technologies\PMW190\Connect\PCMSRV.HLP
>>>>>>mcleod-fd: ../compat/compat.cpp:1107 readdir_r(b64960, { 
>>>>>>d_name="pcmsrv.pdf", d_reclen=10, d_off=66
>>>>>>mcleod-fd: ../compat/compat.cpp:177 Enter wchar_win32_path
>>>>>>mcleod-fd: ../compat/compat.cpp:351 Leave wchar_win32_path=\
>>>>>>mcleod-fd: ../compat/compat.cpp:90 Enter convert_unix_to_win32_path
>>>>>>mcleod-fd: ../compat/compat.cpp:141 path=D:\Program Files\ALK 
>>>>>>Technologies\PMW190\Connect\pcmsrv.pdf
>>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path 
>>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK 
>>>>>>Technologies\PMW190\Connect\pcmsrv.pdf
>>>>>>mcleod-fd: ../compat/compat.cpp:90 Enter convert_unix_to_win32_path
>>>>>>mcleod-fd: ../compat/compat.cpp:141 path=D:\Program Files\ALK 
>>>>>>Technologies\PMW190\Connect\pcmsrv.pdf
>>>>>>mcleod-fd: ../compat/compat.cpp:150 Leave cvt_u_to_win32_path 
>>>>>>path=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy10\Program Files\ALK 
>>>>>>Technologies\PMW190\Connect\pcmsrv.pdf
>>>>>>mcleod-fd: ../../filed/heartbeat.c:77 Got BNET_SIG 0 from SD
>>>>>>mcleod-fd: ../../filed/heartbeat.c:82 wait_intr=1 stop=1
>>>>>>mcleod-fd: ../../filed/backup.c:184 end blast_data ok=0
>>>>>>mcleod-fd: ../../filed/job.c:221 Quit command loop. Canceled=1
>>>>>>mcleod-fd: ../../filed/job.c:303 Calling term_find_files
>>>>>>mcleod-fd: ../../filed/job.c:306 Done with term_find_files
>>>>>>mcleod-fd: ../../filed/job.c:308 Done with free_jcr
>>>>>>
>>>>>>Actually, on the Windows box, I'm trying to back up most of C: and D:.  
>>>>>>Here is what cygwin df says about the data:
>>>>>>
>>>>>>C:\WINDOWS\system32>df
>>>>>>Filesystem           1K-blocks      Used Available Use% Mounted on
>>>>>>C:\cygwin\bin         20482843   9327624  11155219  46% /usr/bin
>>>>>>C:\cygwin\lib         20482843   9327624  11155219  46% /usr/lib
>>>>>>C:\cygwin             20482843   9327624  11155219  46% /
>>>>>>c:                    20482843   9327624  11155219  46% /cygdrive/c
>>>>>>d:                   123170320  12428172 110742148  11% /cygdrive/d
>>>>>>
>>>>>>While the backup statistics give the following:
>>>>>>
>>>>>>JobId:                  8
>>>>>>Job:                    mcleod-job.2006-09-15_07.17.39
>>>>>>Backup Level:           Full
>>>>>>Client:                 "mcleod-fd" Linux,Cross-compile,Win32
>>>>>>FileSet:                "BasicWindowsFileSet" 2006-09-14 21:26:55
>>>>>>Pool:                   "Default"
>>>>>>Storage:                "LTO2"
>>>>>>Scheduled time:         15-Sep-2006 07:17:36
>>>>>>Start time:             15-Sep-2006 07:17:44
>>>>>>End time:               15-Sep-2006 07:38:08
>>>>>>Elapsed time:           20 mins 24 secs
>>>>>>Priority:               1
>>>>>>FD Files Written:       73,064
>>>>>>SD Files Written:       72,744
>>>>>>FD Bytes Written:       20,028,562,878 (20.02 GB)
>>>>>>SD Bytes Written:       20,037,565,140 (20.03 GB)
>>>>>>Rate:                   16363.2 KB/s
>>>>>>Software Compression:   None
>>>>>>Volume name(s):         bacula-1
>>>>>>Volume Session Id:      7
>>>>>>
>>>>>>So, the immense majority of the data was sent.  I don't yet know enought 
>>>>>>about bacula to know if the difference between the FD Files Written and 
>>>>>>SD Files Written is any kind of clue.
>>>>>>
>>>>>>By the way, the beta on Windows looks very promising.  I liked what I 
>>>>>>saw.  I worked a little with BartPE to build a bootable recovery CD.  I 
>>>>>>know the issues associated with that.  Can you post off-topic in your 
>>>>>>own post? 
>>>>>>
>>>>>>bbaker
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>-------------------------------------------------------------------------
>>>>>          
>>>>>
>>>>>>Using Tomcat but need to do more? Need to support web services, 
>>>>>>            
>>>>>>
>security?
>  
>
>>>>>>Get stuff done quickly with pre-integrated technology to make your job 
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>easier
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache 
>>>>>>            
>>>>>>
>Geronimo
>  
>
>>>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>>>_______________________________________________
>>>>>>Bacula-devel mailing list
>>>>>>[EMAIL PROTECTED]
>>>>>>https://lists.sourceforge.net/lists/listinfo/bacula-devel
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>-------------------------------------------------------------------------
>>>>Using Tomcat but need to do more? Need to support web services, security?
>>>>Get stuff done quickly with pre-integrated technology to make your job 
>>>>   
>>>>
>>>>        
>>>>
>>>easier
>>> 
>>>
>>>      
>>>
>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>_______________________________________________
>>>>Bacula-users mailing list
>>>>Bacula-users@lists.sourceforge.net
>>>>https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>
>>>>   
>>>>
>>>>        
>>>>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to