Aaah, I've finally figured it out. The very common Linksys WRT54G v5 router IS dropping inactive sockets after exactly 10 minutes.
I verified this through a process of elimination. Any time the Linksys router was used, I'd get a socket drop at 10 minutes (wireless or ethernet cable included). But when I bypassed the Linksys router and kept everything else the same, it all worked. I think that makes any Bacula job longer than 10 minutes impossible using this Linksys router. Looks like I'm out of luck. I have updated to the newest firmware, and the Linksys config doesn't have any ability to modify the timeout value. I suppose I could buy a new router, or set up a new offsite backup storage daemon. Unless anyone else has any brilliant ideas :) Kern, if you are reading this, what are the chances that a heartbeat could be implemented between the director and the storage daemon? Brad Peterson [EMAIL PROTECTED] ----- Original Message ---- From: Brad Peterson <[EMAIL PROTECTED]> To: bacula-users@lists.sourceforge.net Sent: Tuesday, January 30, 2007 3:00:03 PM Subject: [Bacula-users] Director losing socket with SD First, my request: Is there anything in Bacula I can do to keep the socket between the director and the storage daemon alive? Now, my explanation why I need this. As I'm trying to narrow down why my lengthy backups to an offsite storage daemon don't work, I sat and watched the debug output for the director, the storage daemon, and the file daemon. About almost exactly 10 minutes in, the director's debug output said this: msgchan.c:333 === End msg_thread. use=2 After a lot of research, it appears what's going on is this: 1) The director starts the job. 2) A socket is opened between the director and the storage daemon. 3) A socket is opened between the file daemon and the storage daemon. 4) The file data transfers just fine over the file daemon/storage daemon socket. 5) At almost exactly 10 minutes in, I get the above debug message which means the socket between the director and storage daemon has been closed. netstat confirmed this. 6) The file data continues to transfer just fine to the storage daemon. 7) When the file daemon is done, it tells the director that it is finished. 8) The storage daemon tries to tell the director it received the data perfectly, but cannot, because it cannot communicate with the director anymore (which makes sense, because the socket died). 9) *I think* the director waits a bit for the the storage daemon, or it just knows it can't receive info from the storage daemon. In any event, the director quickly marks the job as having an error because it never heard from the storage daemon as to its final result. So, what can I do in Bacula to keep the socket between the director and the storage daemon alive? I'v already set a heartbeats of 30 seconds, but according to the manual, the heartbeats help the file daemon talk to the director, the file daemon talk to the storage daemon, and the storage daemon talk to the file daemon. But in my situation, I'm losing a socket between the director and the storage daemon, and the heartbeat doesn't help out with that. I'm also starting to think a Linksys router may be the reason why it loses the inactive socket after almost exactly 10 minutes, as other Linksys users have found this happens to them. I'll be able to test this out tonight by bypassing the router completely. Anyways, in the meantime, anybody know how I can keep that socket alive? Brad Peterson [EMAIL PROTECTED] ____________________________________________________________________________________ Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users ____________________________________________________________________________________ Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail beta. http://new.mail.yahoo.com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users