Martin Simmons wrote:

>>>>> >>>>> On Fri, 07 Dec 2007 10:02:58 +0000, Damian Brasher said:
>>>>>           
> > 
> > Martin Simmons wrote:
> > 
>   
>> > > Also, you could attach gdb to each daemon and run the gdb command
>> > >
>> > > thread apply all bt
>> > >
>>     
> > 
> > 11th Nov 07-----------------------------------------------------------
> > Have attached /sbin/bacual-dir /sbin/bacula-fd and /sbin/bacula-sd to
> > gdb, run the commands
> > and will now wait until the error condition repeats.
> > 
> > Will post the output to [(gdb)info file] and [(gdb)thread apply all bt]
> > as soon as I have the error condition
> > as well as the dir/fd and sd status.
> > ----------------------------------------------------------------------
> > 
> > The error has occurred again. I decided to start bacula with the init 
> > scripts and attached gdb to the running process.
> > 
> > All the details descibing this problem are at the beginning of the thread.
> > 
> > As the job halted this was the output from the bconsole, about 200 lines 
> > of roughly the same as below:-
> > 
> > ...Orphaned buffer:  backup-dir      8 bytes buf=9e1f010 allocated at 
> > workq.c:167
> > Orphaned buffer:  backup-dir     16 bytes buf=9e1eee0 allocated at jcr.c:247
> > Orphaned buffer:  backup-dir    528 bytes buf=9e1f038 allocated at jcr.c:255
> > Orphaned buffer:  backup-dir    528 bytes buf=9e23ab8 allocated at job.c:953
> > Orphaned buffer:  backup-dir    528 bytes buf=9e23ce8 allocated at 
> > job.c:1130
> > Orphaned buffer:  backup-dir      6 bytes buf=9e23f50 allocated at 
> > ua_server.c:105
> > Orphaned buffer:  backup-dir    316 bytes buf=9e23f78 allocated at 
> > ua_server.c:192
> > Orphaned buffer:  backup-dir    804 bytes buf=9e24338 allocated at 
> > bsock.c:429
> > Orphaned buffer:  backup-dir    707 bytes buf=9e24c40 allocated at 
> > mem_pool.c:198
> > Orphaned buffer:  backup-dir    707 bytes buf=9e24680 allocated at 
> > mem_pool.c:198
> > Orphaned buffer:  backup-dir     24 bytes buf=9e1f268 allocated at 
> > job.c:1153
> > Orphaned buffer:  backup-dir     40 bytes buf=9e1f2a0 allocated at 
> > alist.c:53...
>   

>That is very unexpected.  The only time I've seen 'Orphaned buffer' messages 
>is after killing the Director.  Could that have happened?  Were there any 
>other messages in the log when the job halted?

The install was compiled with --smartalloc. No, the director was not killed. 
There was nothing else in the logs.

> > 
> >  From command: status all the only unusual output is:
> > 
> > Running Jobs:
> > JobId 166 Job holly.2007-12-06_23.25.09 is running.
> >      Backup Job started: 07-Dec-07 01:41
> >      Files=50,030 Bytes=12,066,088,479 Bytes/sec=407,211
> >      Files Examined=66,825
> >      Processing file: /etc/httpd/conf/httpd.conf
> >      SDReadSeqNo=6 fd=7
> > Director connected at: 07-Dec-07 09:55
> > 
> > The sd status is:
> > 
> > backup-sd Version: 2.2.5 (09 October 2007) i686-pc-linux-gnu redhat 
> > Enterprise release
> > Daemon started 06-Dec-07 11:42, 4 Jobs run since started.
> >   Heap: heap=217,088 smbytes=160,745 max_bytes=161,943 bufs=124 max_bufs=133
> > Sizes: boffset_t=8 size_t=4 int32_t=4 int64_t=8
> > 
> > Running Jobs:
> > Writing: Full Backup job holly JobId=166 Volume="Thursday1"
> >      pool="Thursday" device="LTO-2" (/dev/nst0)
> >      spooling=0 despooling=0 despool_wait=0
> >      Files=50,030 Bytes=12,073,480,104 Bytes/sec=404,607
> >      FDReadSeqNo=646,525 in_msg=552497 out_msg=6 fd=8
>   

>What did the client status show?

I need to capture the error again.

>From the SD backtraces, it looks like the SD is waiting for the FD to confirm 
>that the job has finished.

ok

>Was the gdb attached to the FD while the job was running or did you attach it 
>after it started hanging?  If the latter, are you 100% sure that the bacula-fd 
>process was not restarted somehow?

I am 100% sure the process was not restarted. I attached gdb after the error.

>Do netstat or lsof show any socket connections between the SD and the FD when 
>the job has reached this hanging point?

Will wait for next hang.

>I think you might have to run the SD (and possibly the FD) at debug level 200 
>to collect info about what happens at the end of the job.

Ok, I have upgraded upgraded to 2.2.6 yesterday as I really need to be up and 
running, I will send in another bug report with the extra information you have 
requested if the system hangs again.

thanks so far,

Damian


-- 
Damian Brasher
Systems Admin/Prog
OMII-UK ECS
Southampton University


-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to