Hi Kern,

the bacula-server runs independently of NFS mounts, sorry to say that. 
The bsr files are copied by a cron-job to NFS and all the bacula files 
and mysql are on local disks. The machine which jobs caused the hangs 
is actually independent from NFS, too - at least I do not write or read
anything from or to NFS on all computers involved in the backup process.
Well, there are - or there could be - mounts on the server and the clients
(the machines are running autofs, so you never know if there are mounts
or not), but bacula is independent of these in the sense that I don't
read and don't write to NFS. Or do you mean the locks could be caused by
NFS mounts on the machine which are not directly related to bacula??
Mmh, this could be.... but actually there's no sign of stale NFS
handles in the logs.....

Last night all jobs ran fine. I excluded bali-root which caused the lock
twice. Could it be a hard disk failure on the client? 
Or bad memory in the client? Sometimes I have the feeling, that this
computer "bali" is a little bit weired, since sometimes cron-jobs segfault
without a apparent reason. Maybe bacula is affected, too... (could be
bad memory).

If I wanted to use the debugger with debug symbols, I'd have to recompile
bacula since the debian packages provide only the striped binaries. This
will take a little time, since we're running in production.... So I
really hope, that the error comes from the client....
I'll keep track of things and I'll try to get a debug-version into
production in case the director locks up again. I'll keep the list
up-to-date...

Thanks for your help (so far)!
Regards
Volker


On So, 17 Jul 2005, Kern Sibbald wrote:

> Hello Volker,
> 
> About the only thing I can think of is that you have a stale or bad NFS 
> connection and you are trying to write the bootstrap file to another machine 
> with the bad NFS link -- or perhaps the other machine is just down.  In that 
> case, Bacula will hang forever.  Don't blame me -- I don't know why NFS files 
> when there is no one on the other end block forever.
> 
> If that is not the case, about the only solution is for you to run the 
> director manually under the debugger. When it locks up, ctl-c it, then 
> proceed with getting a traceback using the instructions in the Kaboom chapter 
> (I mainly need the output from "thread apply all bt".  Make sure you have 
> debug symbols turned on (i.e. compiled with -g and not stripped.  Note 
> FreeBSD has a habit of stripping everything it installs).
> 
> On Saturday 16 July 2005 12:02, Volker Sauer wrote:
> > On Fr, 15 Jul 2005, Volker Sauer wrote:
> > > On Fr, 15 Jul 2005, Arno Lehmann wrote:
> > > > >I'll upgrade to 1.36.3 and see what happens. Maybe "Fix deadlock in
> > > > >multiple simultaneous jobs." (from ReleaseNotes) could be the right
> > > > > one. I already setup this site with 1.36.3 FileFormat because I knew
> > > > > it's going to be required!
> > > >
> > > > I had the same problem of a locking DIR, which worked ok after a
> > > > restart, and I could never find a reason (partly because I never
> > > > investigated with gdb, but that's beyond my skills and as long as I
> > > > could restart my backups rather easily that was ok).
> > > > With 1.36.3 this problem vanished.
> > > > Until yesterday.
> > >
> > > Yes, the same with me. I upgraded to 1.36.3 and the problem occured
> > > again, yesterday.
> > > Now I setup "trace on" and "setdebug 100" for dir and sd and I'm waiting
> > > for the problem to occur again!
> >
> > Last night, the director locked up again. (See traces attached).
> > The job "paris-home.archived" was finished. The jobs "paris-home.guest"
> > and "paris-home.staff.1" are stuck in the holding-disk, because the
> > director locked up as the job "bali-rootfs" started - nothing was
> > spooled from bali-rootfs, the director seemed to be stuck immediately.
> > Btw: The director Maximum Concurrent Jobs = 6 and, the client is usually
> > set to Maximum Concurrent Jobs = 1 except the host paris, where it is 2.
> > The storage daemon is set to Maximum Concurrent Jobs = 20.
> >
> > An interesting thing is: again it's the job bali-rootfs the causes the
> > director to lock up. I'll exclude this job for a few days and see if the
> > director still locks up. Plus, I'll set the debuglevel to 200.
> >
> > I've attach backup-dir.conmsg and bacula.trace (level 100). I don't see
> > anything unusual in bacula.trace.
> >
> > Btw: the first part of bacula.trace are the jobs of the night before
> > last night. They finished without problems. The trace of last night
> > seems to start around line 518. At the end of the file in line 722 I
> > tried to connect with bconsole. The connect timed out with no entry in
> > the logfile.
> >
> > I cleared the kernel ringbuffer yesterday so in case any hardware or
> > bus-problems occur, the should be error. There's only:
> >
> > ---------------
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > APIC error on CPU1: 02(02)
> > APIC error on CPU0: 02(02)
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > nfs warning: mount version older than kernel
> > ----------------
> >
> > That's all.
> >
> > I hope you can see something in the logs, that I missed!
> 
> -- 
> Best regards,
> 
> Kern
> 
>   (">
>   /\
>   V_V
> 
> 

-- 
  Volker Sauer  *  Alexanderstrasse 39/217  *  64283 Darmstadt
  Telefon: 06151-154260  *  Mobil: 0179-6901475 * ICQ#98164307
  mailto:[EMAIL PROTECTED]  *  http://www.volker-sauer.de
  PGPKey-Fingerprint: DB2611C7B12E0B2739992E4F7E354E4D5DD5D0E0

Attachment: signature.asc
Description: Digital signature

Reply via email to