This is not a bug, but rather an insanity check. If you want to have idle jobs remain in the system longer, take a looks at src/lib/watchdog.c -- someplace in that file there should be a tag that sets the timeout, which you can make longer as you wish.
On Monday 05 March 2007 20:35, Alan Davis wrote: > I was running a very large archival backup and about 20 hours into the > backup I ran out of tapes that had the recycle flag set. I updated the > flags and purged the first tape. The system then loaded the next tape > and continued the backup. The SD (or FD), however, never signaled the > DIR that the job had resumed and it stayed in "waiting for appendable > Volume" (JS_WaitMedia) for 518415 secs (6 days) and then the DIR killed > the job with the messages: > > 04-Mar 17:17 gannon-dir: LiveArchiveJob.2007-02-26_17.16.43 Error: > Watchdog sending kill after 518415 secs to thread stalled reading File > daemon. > 04-Mar 17:17 gannon-dir: LiveArchiveJob.2007-02-26_17.16.43 Fatal error: > Network error with FD during Backup: ERR=Interrupted system call > 04-Mar 17:17 gannon-dir: LiveArchiveJob.2007-02-26_17.16.43 Fatal error: > No Job status returned from FD. > > The SD, FD and DIR are all running on the same node so network problems > between them did not cause the timeout. > > The wait status seems to come from the SD and is reported by the DIR, > but the kill message from the DIR indicates that not being able to > communicate with the FD was the reason it killed the job. > > I've looked at some of the code and the best candidate that I've found > so far for where a problem might cause this is in > filed/heartbeat.c:sd_heartbeat_thread or somewhere in the acquire/mount > code that a message isn't being sent back to the DIR. > > Due to the long runtime of the backup it's not practical for me to try > to duplicate the problem exactly. I will try to create a reproducer with > a smaller backup set once I have the archive backup completed. > > Any insight on the possible cause(s) would be greatly appreciated. > > > ---- > Alan Davis > Senior Architect > Ruckus Network, Inc. > 703.464.6578 (o) > 410.365.7175 (m) > [EMAIL PROTECTED] > alancdavis AIM > > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users