Aaah, I've finally figured it out.  The very common Linksys WRT54G v5 router IS 
dropping inactive sockets after exactly 10 minutes. 

I verified this through a process of elimination.  Any time the Linksys router 
was used, I'd get a socket drop at 10 minutes (wireless or ethernet cable 
included).  But when I bypassed the Linksys router and kept everything else the 
same, it all worked.

I think that makes any Bacula job longer than 10 minutes impossible using this 
Linksys router.  Looks like I'm out of luck.  I have updated to the newest 
firmware, and the Linksys config doesn't have any ability to modify the timeout 
value.   I suppose I could buy a new router, or set up a new offsite backup 
storage daemon.   Unless anyone else has any brilliant ideas  :)

Kern, if you are reading this, what are the chances that a heartbeat could be 
implemented between the director and the storage daemon?

Brad Peterson
[EMAIL PROTECTED]

  
----- Original Message ----
From: Brad Peterson <[EMAIL PROTECTED]>
To: bacula-users@lists.sourceforge.net
Sent: Tuesday, January 30, 2007 3:00:03 PM
Subject: [Bacula-users] Director losing socket with SD

First, my request:  Is there anything in Bacula I can do to keep the socket 
between the director and the storage daemon alive?

Now, my explanation why I need this.  As I'm trying to narrow down why my 
lengthy backups to an offsite storage daemon don't work, I sat and watched the 
debug output for the director, the storage daemon, and the file daemon.  About 
almost exactly 10 minutes in, the director's debug output said this:

msgchan.c:333 === End msg_thread. use=2

After a lot of research, it appears what's going on is this:

1) The director starts the job.
2) A socket is opened between the director and the storage daemon.
3) A socket is opened between the file daemon and the storage daemon.
4) The file data transfers just fine over the file daemon/storage daemon 
socket.  
5) At almost exactly 10 minutes in, I get the above debug message which means 
the socket between the director and storage daemon has been closed.  netstat 
confirmed this.
6) The file data continues to transfer just fine to the storage daemon.
7) When the file daemon is done, it tells the director that it is finished.
8) The storage daemon tries to tell the director it received the data 
perfectly, but cannot, because it cannot communicate with the director anymore 
(which makes sense, because the socket died).
9) *I think* the director waits a bit for the the storage daemon, or it just 
knows it can't receive info from the storage daemon.  In any event, the 
director quickly marks the job as having an error because it never heard from 
the storage daemon as to its final result.

So, what can I do in Bacula to keep the socket between the director and the 
storage daemon alive?

I'v already set a heartbeats of 30 seconds, but according to the manual, the 
heartbeats help the file daemon talk to the director, the file daemon talk to 
the storage daemon, and the storage daemon talk to the file daemon.  But in my 
situation, I'm losing a socket between the director and the storage daemon, and 
the heartbeat doesn't help out with that.

I'm also starting to think a Linksys router may be the reason why it loses the 
inactive socket after almost exactly 10 minutes, as other Linksys users have 
found this happens to them.  I'll be able to test this out tonight by bypassing 
the router completely. 

Anyways, in the meantime, anybody know how I can keep that socket alive?

Brad Peterson
[EMAIL PROTECTED]







 
____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users





 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to