Le 20/05/2011 10:22, Yann Cézard a écrit : > Hi everyone, > > Since a few weeks, I am facing a really strange problem with > my win32 bacula-fd. > > It seems that the problem started when I upgraded my SD + DIR > to the 5.0.X (I was still using the 2.4.4 until that time). > The problem is that almost (this is only observable on Full backups > of several GB) all my win32-fd Full backups now fails with the > following errors : > - with a 2.4.4 client : > 12-May 22:22 msadpau-fd JobId 2538: Generate VSS snapshots. Driver="VSS Win > 2003", Drive(s)="E" > 13-May 00:00 msadpau-fd JobId 2538: Fatal error: ../../filed/backup.c:892 > Network send error to SD. ERR=Input/output error > 13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "System > Writer", State: 0x1 (VSS_WS_STABLE) > [...] > 13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "NTDS", > State: 0x1 (VSS_WS_STABLE) > 13-May 00:02 backuppa-sd JobId 2538: JobId=2538 > Job="msad-stockage-pau.2011-05-12_22.00.00_37" marked to be canceled. > 13-May 00:02 backuppa-sd JobId 2538: Job write elapsed time = 01:39:44, > Transfer rate = 6.296 M Bytes/second > 13-May 00:02 backuppa-sd JobId 2538: Error: bsock.c:518 Read error from > client:10.1.2.17:36643: ERR=Connection reset by peer > 13-May 00:02 backuppa-dir JobId 2538: Error: Bacula backuppa-dir 5.0.2 > (28Apr10): 13-May-2011 00:02:15 > Build OS: x86_64-pc-linux-gnu debian squeeze/sid > Backup Level: Full > Client: "BLABLABLA" 2.4.4 (28Dec08) > Linux,Cross-compile,Win32 > - after upgrading the client to 5.0.3 (error message is more verbose, but > the problem is still there) : > 16-May 17:22 msadpau-fd JobId 2552: Generate VSS snapshots. Driver="VSS Win > 2003", Drive(s)="E" > 16-May 20:14 msadpau-fd JobId 2552: Error: > /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65562 > bytes to Storage daemon:backuppa:9103: ERR=Input/output error > 16-May 20:14 msadpau-fd JobId 2552: Fatal error: > /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. > ERR=Input/output error > 16-May 20:15 msadpau-fd JobId 2552: Error: > /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to > Storage daemon:backuppa:9103 > 16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "System > Writer", State: 0x1 (VSS_WS_STABLE) > [...] > 16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "WMI > Writer", State: 0x1 (VSS_WS_STABLE) > 16-May 20:16 backuppa-sd JobId 2552: JobId=2552 > Job="msad-stockage-pau.2011-05-16_17.22.21_58" marked to be canceled. > 16-May 20:16 backuppa-sd JobId 2552: Job write elapsed time = 02:53:20, > Transfer rate = 6.234 M Bytes/second > 16-May 20:16 backuppa-sd JobId 2552: Error: bsock.c:518 Read error from > client:10.1.2.17:36643: ERR=Connection reset by peer > 16-May 20:16 backuppa-dir JobId 2552: Error: Bacula backuppa-dir 5.0.2 > (28Apr10): 16-May-2011 20:16:02 > Build OS: x86_64-pc-linux-gnu debian squeeze/sid > Backup Level: Full > Client: "BLABLABLA" 5.0.3 (04Aug10) > Linux,Cross-compile,Win32 > - after upgrading the DIR/SD from 5.0.2 to 5.0.3 (Debian squeeze => > wheezy) : > 19-mai 10:07 msadpau-fd JobId 2565: Generate VSS snapshots. Driver="VSS Win > 2003", Drive(s)="E" > 19-mai 10:09 msadpau-fd JobId 2565: Error: > /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65536 > bytes to Storage daemon:backuppa:9103: ERR=Input/output error > 19-mai 10:09 msadpau-fd JobId 2565: Fatal error: > /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. > ERR=Input/output error > 19-mai 10:11 msadpau-fd JobId 2565: Error: > /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to > Storage daemon:backuppa:9103 > 19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "System > Writer", State: 0x1 (VSS_WS_STABLE) > [...] > 19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "WMI > Writer", State: 0x1 (VSS_WS_STABLE) > 19-mai 10:11 backuppa-sd JobId 2565: JobId=2565 > Job="msad-stockage-pau.2011-05-19_10.03.34_03" marked to be canceled. > 19-mai 10:11 backuppa-sd JobId 2565: Error: bsock.c:537 Read error from > client:10.1.2.17:36643: ERR=Connexion ré-initialisée par le correspondant > 19-mai 10:11 backuppa-sd JobId 2565: Job write elapsed time = 00:04:12, > Transfer rate = 11.61 M Bytes/second > 19-mai 10:11 backuppa-dir JobId 2565: Error: Bacula backuppa-dir 5.0.3 > (04Aug10): 19-mai-2011 10:11:35 > Build OS: x86_64-pc-linux-gnu debian wheezy/sid > Backup Level: Full > Client: "Serveur MSAD Pau" 5.0.3 (04Aug10) > Linux,Cross-compile,Win32 > After doing a lot of search through the list, and "googling", I found > differents > testimonies of similar problems, but no real solution. > > What I tried : > - turn off the anti-virus software : no change > - Heartbeat Level in Storage and Client : no change > - reducing keepalive on SD : no change > - changing the Network Buffer Size : no change : > 19-mai 17:29 msadpau-fd JobId 2577: Error: > /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 32768 > bytes to Storage daemon:backuppa:9103: ERR=Input/output error > > 19-mai 17:36 msadpau-fd JobId 2579: Error: > /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 131072 > bytes to Storage daemon:backuppa:9103: ERR=Input/output error > - I also turn on debuging/tracing on FD and SD size, didn't help so much. > Here is the FD side : > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send > data to SD len=65536 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send > data to SD len=65536 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send > data to SD len=65536 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send > data to SD len=65536 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send > data to SD len=65536 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0 > wait_intr=0 stop=0 > /// Here, the SD side is showing that it's doing a read and both side > seems to wait for about 2 minutes > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0 > wait_intr=0 stop=0 > /// and then the connexion is closed. > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0 > wait_intr=0 stop=0 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:91-0 Got > BNET_SIG 0 from SD > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0 > wait_intr=1 stop=1 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:211-0 end > blast_data ok=0 > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:1660-0 Error in > blast_data. > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:282-0 Quit > command loop. Canceled=1 > msadpau-fd: /home/kern/bacula/k/bacula/src/lib/runscript.c:110-0 > runscript: running all RUNSCRIPT object (ClientAfterJob) JobStatus=f > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:309-0 End FD > msg: 2800 End Job TermCode=102 JobFiles=21762 ReadBytes=2921610796 > JobBytes=2921545260 Errors=2 VSS=1 Encrypt=0 > > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:388-0 Calling > term_find_files > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:391-0 Done with > term_find_files > msadpau-fd: > /home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:210-0 Enter > wchar_win32_path > msadpau-fd: > /home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:394-0 Leave > wchar_win32_path=\ > msadpau-fd: /home/kern/bacula/k/bacula/src/lib/jcr.c:181-0 > write_last_jobs seek to 192 > msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0 > garbage collect memory pool > msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:393-0 Done with > free_jcr > msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0 > garbage collect memory pool > > > But now the strangiest thing is that, in order to have the job fails > faster > during my tests, I turn off the compression feature. And the result is > that the > job failed "a lot" faster, but not due to compression. In fact, with > compression, > sometime the job failed at 40GB written, sometime at 100GB. Without > compression, > it failed at 2GB, always at the exactly same number of bytes written. > So i decided to make a test in the other way, and set the compression up > (was GZIP2 at start, set it at GZIP4) : the job ran well ! > > So now I am wondering what can I do to found where the problem is > exactly ? > Is it on FD side ? on SD side ? It seems to happen when the data is > send too fast, > or too much load on the client side ? > > Other facts : > - problem is observed on different kind of hardware > - problem is observed with clients on the same network than the > the backup server, as well as with client on another network. > - no problem at all on linux client with bigger sized jobs. > - there is no heavy load on the DIR/SD/MySQL side > - the different servers which have the problem doesn't show > any problem of network connectivity. > > Any clue ? anything that I can do to try to trace the problem ? > > Regards Nobody has a clue ?
It's really annoying, a lot of full backups are failling, it's getting really problematic. We tried to update the NIC driver, not better, I tried to deactivate TCP offload in the NIC driver, not better (even worse in fact, the job failed faster). I found differents testimonies of such problems on list archives, but no solution. Did everybody give up and go to another backup solution for Windows client ? It seems to be Windows/network related, but the fact is that bacula is the only application that has problem; so it's really weird. Regards. -- Yann Cézard - infrastructures - administrateur systèmes serveurs Centre de ressources informatiques - http://cri.univ-pau.fr Université de Pau et des pays de l'Adour - http://www.univ-pau.fr
------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Discover what all the cheering's about. Get your free trial download today. http://p.sf.net/sfu/quest-dev2dev2
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users