Le 20/05/2011 10:22, Yann Cézard a écrit :
> Hi everyone,
>
> Since a few weeks, I am facing a really strange problem with
> my win32 bacula-fd.
>
> It seems that the problem started when I upgraded my SD + DIR
> to the 5.0.X (I was still using the 2.4.4 until that time).
> The problem is that almost (this is only observable on Full backups
> of several GB) all my win32-fd Full backups now fails with the
> following errors :
> - with a 2.4.4 client :
> 12-May 22:22 msadpau-fd JobId 2538: Generate VSS snapshots. Driver="VSS Win 
> 2003", Drive(s)="E"
> 13-May 00:00 msadpau-fd JobId 2538: Fatal error: ../../filed/backup.c:892 
> Network send error to SD. ERR=Input/output error
> 13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "System 
> Writer", State: 0x1 (VSS_WS_STABLE)
> [...]
> 13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "NTDS", 
> State: 0x1 (VSS_WS_STABLE)
> 13-May 00:02 backuppa-sd JobId 2538: JobId=2538 
> Job="msad-stockage-pau.2011-05-12_22.00.00_37" marked to be canceled.
> 13-May 00:02 backuppa-sd JobId 2538: Job write elapsed time = 01:39:44, 
> Transfer rate = 6.296 M Bytes/second
> 13-May 00:02 backuppa-sd JobId 2538: Error: bsock.c:518 Read error from 
> client:10.1.2.17:36643: ERR=Connection reset by peer
> 13-May 00:02 backuppa-dir JobId 2538: Error: Bacula backuppa-dir 5.0.2 
> (28Apr10): 13-May-2011 00:02:15
>   Build OS:               x86_64-pc-linux-gnu debian squeeze/sid
>   Backup Level:           Full
>   Client:                 "BLABLABLA" 2.4.4 (28Dec08) 
> Linux,Cross-compile,Win32
> - after upgrading the client to 5.0.3 (error message is more verbose, but
> the problem is still there) :
> 16-May 17:22 msadpau-fd JobId 2552: Generate VSS snapshots. Driver="VSS Win 
> 2003", Drive(s)="E"
> 16-May 20:14 msadpau-fd JobId 2552: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65562 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
> 16-May 20:14 msadpau-fd JobId 2552: Fatal error: 
> /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. 
> ERR=Input/output error
> 16-May 20:15 msadpau-fd JobId 2552: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to 
> Storage daemon:backuppa:9103
> 16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "System 
> Writer", State: 0x1 (VSS_WS_STABLE)
> [...]
> 16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "WMI 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 16-May 20:16 backuppa-sd JobId 2552: JobId=2552 
> Job="msad-stockage-pau.2011-05-16_17.22.21_58" marked to be canceled.
> 16-May 20:16 backuppa-sd JobId 2552: Job write elapsed time = 02:53:20, 
> Transfer rate = 6.234 M Bytes/second
> 16-May 20:16 backuppa-sd JobId 2552: Error: bsock.c:518 Read error from 
> client:10.1.2.17:36643: ERR=Connection reset by peer
> 16-May 20:16 backuppa-dir JobId 2552: Error: Bacula backuppa-dir 5.0.2 
> (28Apr10): 16-May-2011 20:16:02
>   Build OS:               x86_64-pc-linux-gnu debian squeeze/sid
>   Backup Level:           Full
>   Client:                 "BLABLABLA" 5.0.3 (04Aug10) 
> Linux,Cross-compile,Win32
> - after upgrading the DIR/SD from 5.0.2 to 5.0.3 (Debian squeeze =>
> wheezy) :
> 19-mai 10:07 msadpau-fd JobId 2565: Generate VSS snapshots. Driver="VSS Win 
> 2003", Drive(s)="E"
> 19-mai 10:09 msadpau-fd JobId 2565: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65536 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
> 19-mai 10:09 msadpau-fd JobId 2565: Fatal error: 
> /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. 
> ERR=Input/output error
> 19-mai 10:11 msadpau-fd JobId 2565: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to 
> Storage daemon:backuppa:9103
> 19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "System 
> Writer", State: 0x1 (VSS_WS_STABLE)
> [...]
> 19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "WMI 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 19-mai 10:11 backuppa-sd JobId 2565: JobId=2565 
> Job="msad-stockage-pau.2011-05-19_10.03.34_03" marked to be canceled.
> 19-mai 10:11 backuppa-sd JobId 2565: Error: bsock.c:537 Read error from 
> client:10.1.2.17:36643: ERR=Connexion ré-initialisée par le correspondant
> 19-mai 10:11 backuppa-sd JobId 2565: Job write elapsed time = 00:04:12, 
> Transfer rate = 11.61 M Bytes/second
> 19-mai 10:11 backuppa-dir JobId 2565: Error: Bacula backuppa-dir 5.0.3 
> (04Aug10): 19-mai-2011 10:11:35
>   Build OS:               x86_64-pc-linux-gnu debian wheezy/sid
>   Backup Level:           Full
>   Client:                 "Serveur MSAD Pau" 5.0.3 (04Aug10) 
> Linux,Cross-compile,Win32
> After doing a lot of search through the list, and "googling", I found
> differents
> testimonies of similar problems, but no real solution.
>
> What I tried :
> - turn off the anti-virus software : no change
> - Heartbeat Level in Storage and Client : no change
> - reducing keepalive on SD : no change
> - changing the Network Buffer Size : no change :
> 19-mai 17:29 msadpau-fd JobId 2577: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 32768 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
>
> 19-mai 17:36 msadpau-fd JobId 2579: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 131072 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
> - I also turn on debuging/tracing on FD and SD size, didn't help so much.
>   Here is the FD side :
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=0 stop=0
> /// Here, the SD side is showing that it's doing a read and both side
> seems to wait for about 2 minutes
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=0 stop=0
> /// and then the connexion is closed.
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=0 stop=0
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:91-0 Got
> BNET_SIG 0 from SD
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=1 stop=1
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:211-0 end
> blast_data ok=0
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:1660-0 Error in
> blast_data.
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:282-0 Quit
> command loop. Canceled=1
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/runscript.c:110-0
> runscript: running all RUNSCRIPT object (ClientAfterJob) JobStatus=f
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:309-0 End FD
> msg: 2800 End Job TermCode=102 JobFiles=21762 ReadBytes=2921610796
> JobBytes=2921545260 Errors=2 VSS=1 Encrypt=0
>
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:388-0 Calling
> term_find_files
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:391-0 Done with
> term_find_files
> msadpau-fd:
> /home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:210-0 Enter
> wchar_win32_path
> msadpau-fd:
> /home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:394-0 Leave
> wchar_win32_path=\
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/jcr.c:181-0
> write_last_jobs seek to 192
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0
> garbage collect memory pool
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:393-0 Done with
> free_jcr
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0
> garbage collect memory pool
>
>
> But now the strangiest thing is that, in order to have the job fails
> faster
> during my tests, I turn off the compression feature. And the result is
> that the
> job failed "a lot" faster, but not due to compression. In fact, with
> compression,
> sometime the job failed at 40GB written, sometime at 100GB. Without
> compression,
> it failed at 2GB, always at the exactly same number of bytes written.
> So i decided to make a test in the other way, and set the compression up
> (was GZIP2 at start, set it at GZIP4) : the job ran well !
>
> So now I am wondering what can I do to found where the problem is
> exactly ?
> Is it on FD side ? on SD side ? It seems to happen when the data is
> send too fast,
> or too much load on the client side ?
>
> Other facts :
> - problem is observed on different kind of hardware
> - problem is observed with clients on the same network than the
>   the backup server, as well as with client on another network.
> - no problem at all on linux client with bigger sized jobs.
> - there is no heavy load on the DIR/SD/MySQL side
> - the different servers which have the problem doesn't show
>   any problem of network connectivity.
>
> Any clue ? anything that I can do to try to trace the problem ?
>
> Regards
Nobody has a clue ?

It's really annoying, a lot of full backups are failling, it's getting
really problematic.
We tried to update the NIC driver, not better, I tried to deactivate TCP
offload in the NIC
driver, not better (even worse in fact, the job failed faster).

I found differents testimonies of such problems on list archives, but no
solution.
Did everybody give up and go to another backup solution for Windows client ?
It seems to be Windows/network related, but the fact is that bacula is
the only application
that has problem; so it's really weird.

Regards.

-- 
Yann Cézard  -  infrastructures - administrateur systèmes serveurs
Centre de ressources informatiques    -     http://cri.univ-pau.fr
Université de Pau et des pays de l'Adour -  http://www.univ-pau.fr

------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Discover what all the cheering's about.
Get your free trial download today. 
http://p.sf.net/sfu/quest-dev2dev2 
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to