Von: Josh Fisher [mailto:jfis...@pvct.com] 
Gesendet: Freitag, 5. Oktober 2012 18:50
An: bacula-users@lists.sourceforge.net
Betreff: Re: [Bacula-users] Network error with FD during Backup:
ERR=Connection reset by peer

 

On 10/4/2012 4:05 AM, DAHLBOKUM Markus (FPT INDUSTRIAL) wrote:

        Hi Tom,

         

        Thank you for your answer.

         

        >The heartbeats are only setup when a job with a client is
initiated. 

        >So, there should be no activity when no job is running.  When
you 

        >initiate a job with the client, the director sets up a
connection with 

        >the client telling the client what storage daemon to use.  The
client 

        >then initiates a connection back to that storage daemon.  If
you have 

        >the heartbeat settings in place as you do then you should see
heartbeat 

        >packets sent from the client back to the director in order to
keep that 

        >connection alive while the data is being sent back to the
storage 

        >daemon.  In addition, you may see heartbeat packets send from
the 

        >storage daemon to the client.  I'd have to re-look at the code
but I 

        >believe this is used in the scenario where the storage daemon
is waiting 

        >for a volume to write the data to (i.e. operator intervention).
If the 

        >heartbeat setting is on then the storage daemon will send
heartbeats 

        >back to the client in order to keep the connection alive while
it waits.

         

        Yesterday I waited for the job to finish the first tape and then
wait for me to insert the next one.

        I opened wireshark to see if there is a heartbeat during waiting
- and there was none. During the job the heartbeat was active.

        From what you wrote the heartbeat should be active when waiting
for a tape. Could you try to confirm that (have a look at the code)?

         

        As one side of the backup is a VMware server I had a closer look
to the configuration of this environment.

        As far as I know Michael's environment (the starter of this
thread) is also including VMware. So this might be interesting for him.

        My job cancels exactly 15 min after entering the wait mode for a
new tape. In the VMware settings there is an idle timeout set to 900 sec
(i.e. 15 min).

        The timeout doesn't exactly fit to that kind of connection, but
you never know.

        I disabled this timeout now and restarted my backup. In 7 hours
I will see the result.

        But even if this setting caused the trouble, I would have
thought the heartbeat should solve this (idle connection timeout).

        Again, it would be good to know if the heartbeat should be
active during waiting for a tape.


It could be the client OS timing out too. For example, network activity
in a Windows daemon does not necessarily keep Windows from going into
suspend. This is because the myriad daemons checking for updates, etc.
could potentially keep the machine from ever suspending. So Windows has
an API function SetThreadExecutionState() that a daemon can use to
prevent or allow suspend on an as needed basis. Recent versions of
Windows are more aggressive on power management and default to allowing
suspend even when there is network activity. I'm not sure what happens
in VMWare when a client OS suspends a virtual NIC, but my guess is that
the VMWare timeout might be ignored if the client OS "powers off" the
interface on its own.

In the fully virtualized environment, I disabled the powersaving option
for the (virtuel) E1000 NIC. I also disabled ipv6 on this particular
NIC. Now I have to wait for the next error.
 

 
NovaNet GmbH
Kupferstr. 65
44532 Lunen
Telefon: 02306/202100
FAX: 02306/202109
WEB: www.novanetgmbh.de
Firmensitz: Lunen
Amtsgericht Dortmund HRB 17273
USt-ID DE 124793480, St.-Nr. 316/5759/0318
Geschaftsfuhrerin: Dipl. Informatikerin (FH) Desiree Wunsche

<<inline: NNLogo.jpg>>

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to