Martin Simmons wrote: >>>>> >>>>> On Fri, 07 Dec 2007 10:02:58 +0000, Damian Brasher said: >>>>> > > > > Martin Simmons wrote: > > > >> > > Also, you could attach gdb to each daemon and run the gdb command >> > > >> > > thread apply all bt >> > > >> > > > > 11th Nov 07----------------------------------------------------------- > > Have attached /sbin/bacual-dir /sbin/bacula-fd and /sbin/bacula-sd to > > gdb, run the commands > > and will now wait until the error condition repeats. > > > > Will post the output to [(gdb)info file] and [(gdb)thread apply all bt] > > as soon as I have the error condition > > as well as the dir/fd and sd status. > > ---------------------------------------------------------------------- > > > > The error has occurred again. I decided to start bacula with the init > > scripts and attached gdb to the running process. > > > > All the details descibing this problem are at the beginning of the thread. > > > > As the job halted this was the output from the bconsole, about 200 lines > > of roughly the same as below:- > > > > ...Orphaned buffer: backup-dir 8 bytes buf=9e1f010 allocated at > > workq.c:167 > > Orphaned buffer: backup-dir 16 bytes buf=9e1eee0 allocated at jcr.c:247 > > Orphaned buffer: backup-dir 528 bytes buf=9e1f038 allocated at jcr.c:255 > > Orphaned buffer: backup-dir 528 bytes buf=9e23ab8 allocated at job.c:953 > > Orphaned buffer: backup-dir 528 bytes buf=9e23ce8 allocated at > > job.c:1130 > > Orphaned buffer: backup-dir 6 bytes buf=9e23f50 allocated at > > ua_server.c:105 > > Orphaned buffer: backup-dir 316 bytes buf=9e23f78 allocated at > > ua_server.c:192 > > Orphaned buffer: backup-dir 804 bytes buf=9e24338 allocated at > > bsock.c:429 > > Orphaned buffer: backup-dir 707 bytes buf=9e24c40 allocated at > > mem_pool.c:198 > > Orphaned buffer: backup-dir 707 bytes buf=9e24680 allocated at > > mem_pool.c:198 > > Orphaned buffer: backup-dir 24 bytes buf=9e1f268 allocated at > > job.c:1153 > > Orphaned buffer: backup-dir 40 bytes buf=9e1f2a0 allocated at > > alist.c:53... >
>That is very unexpected. The only time I've seen 'Orphaned buffer' messages >is after killing the Director. Could that have happened? Were there any >other messages in the log when the job halted? The install was compiled with --smartalloc. No, the director was not killed. There was nothing else in the logs. > > > > From command: status all the only unusual output is: > > > > Running Jobs: > > JobId 166 Job holly.2007-12-06_23.25.09 is running. > > Backup Job started: 07-Dec-07 01:41 > > Files=50,030 Bytes=12,066,088,479 Bytes/sec=407,211 > > Files Examined=66,825 > > Processing file: /etc/httpd/conf/httpd.conf > > SDReadSeqNo=6 fd=7 > > Director connected at: 07-Dec-07 09:55 > > > > The sd status is: > > > > backup-sd Version: 2.2.5 (09 October 2007) i686-pc-linux-gnu redhat > > Enterprise release > > Daemon started 06-Dec-07 11:42, 4 Jobs run since started. > > Heap: heap=217,088 smbytes=160,745 max_bytes=161,943 bufs=124 max_bufs=133 > > Sizes: boffset_t=8 size_t=4 int32_t=4 int64_t=8 > > > > Running Jobs: > > Writing: Full Backup job holly JobId=166 Volume="Thursday1" > > pool="Thursday" device="LTO-2" (/dev/nst0) > > spooling=0 despooling=0 despool_wait=0 > > Files=50,030 Bytes=12,073,480,104 Bytes/sec=404,607 > > FDReadSeqNo=646,525 in_msg=552497 out_msg=6 fd=8 > >What did the client status show? I need to capture the error again. >From the SD backtraces, it looks like the SD is waiting for the FD to confirm >that the job has finished. ok >Was the gdb attached to the FD while the job was running or did you attach it >after it started hanging? If the latter, are you 100% sure that the bacula-fd >process was not restarted somehow? I am 100% sure the process was not restarted. I attached gdb after the error. >Do netstat or lsof show any socket connections between the SD and the FD when >the job has reached this hanging point? Will wait for next hang. >I think you might have to run the SD (and possibly the FD) at debug level 200 >to collect info about what happens at the end of the job. Ok, I have upgraded upgraded to 2.2.6 yesterday as I really need to be up and running, I will send in another bug report with the extra information you have requested if the system hangs again. thanks so far, Damian -- Damian Brasher Systems Admin/Prog OMII-UK ECS Southampton University ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users