In the message dated: Tue, 09 Oct 2007 15:55:58 +0200, The pithy ruminations from Kern Sibbald on <Re: [Bacula-devel] bacula-sd 2.2.4 goes kaboom! (segfault on despooling data)> were: => Hello,
Since 2.2.5 has been released, I've built that version, and I'm seeing the same behavior. => => Most likely you forgot to put in an Autochanger resource for controlling your Nope. The same configuration works fine with the autochanger under 1.38.11. Here are excerpts from the config files: -------------- bacula-dir.conf ---------------------- # Definition of file storage device Storage { Name = pv132t # Do not use "localhost" here Address = parthenon # N.B. Use a fully qualified name here SDPort = 9103 Password = "IeRUDo7djGxxxxxxxxxxxxxxxxxxxxxxxxxlM1xUkbrEJnq4K" Device = pv132t Media Type = LTO2 Autochanger = yes Maximum Concurrent Jobs = 25 } ----------------------------------------------------- --------------- bacula-sd.conf ---------------------- # # An autochanger device with two drives # Autochanger { Name = pv132t Device = Drive-0 Device = Drive-1 Changer Command = "/usr/local/bacula-2.2.5/bin/mtx-changer %c %o %S %a %d" Changer Device = /dev/changer } Device { Name = Drive-0 # LabelMedia = yes; # lets Bacula label unlabeled media Drive Index = 0 Media Type = LTO2 Archive Device = /dev/tape0 AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes # Enable the Alert command only if you have the mtx package loaded Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'" Maximum Spool Size = 60G # For network client Maximum Network Buffer Size = 65536 Spool Directory = /san3/var/spool/bacula Autoselect = yes } Device { Name = Drive-1 # LabelMedia = yes; # lets Bacula label unlabeled media Drive Index = 1 Media Type = LTO2 Archive Device = /dev/tape1 AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes # Enable the Alert command only if you have the mtx package loaded Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'" Maximum Spool Size = 60G # For network client Maximum Network Buffer Size = 65536 Spool Directory = /san2/var/spool/bacula Autoselect = yes } ----------------------------------------------------- => autochanger. Barring that, you will need to run the SD under the debugger as => defined in the Kaboom chapter of the manual and obtain a good traceback. Done. Here's a script(1) of the gdb output. Please let me know if there's any more information that I can provide, or any alternative way of testing this. -------------------------------------------------------------------- Script started on Thu 11 Oct 2007 05:25:41 PM EDT [EMAIL PROTECTED] bacula-2.2.5]$ ./test_bacula.rc start Starting the Storage daemon Starting the File daemon Starting the Director daemon [EMAIL PROTECTED] bacula-2.2.5]$ ps -ef | grep bacula root 18416 18385 0 17:25 pts/24 00:00:00 script bacula-debugging root 18417 18416 0 17:25 pts/24 00:00:00 script bacula-debugging root 18449 1 0 17:25 ? 00:00:00 /usr/local/bacula-2.2.5/sbin/bacula-fd -u root -g root -v -c /usr/local/bacula-2.2.5/etc/bacula-fd.conf root 18451 18449 0 17:25 ? 00:00:00 /usr/local/bacula-2.2.5/sbin/bacula-fd -u root -g root -v -c /usr/local/bacula-2.2.5/etc/bacula-fd.conf root 18452 18451 0 17:25 ? 00:00:00 /usr/local/bacula-2.2.5/sbin/bacula-fd -u root -g root -v -c /usr/local/bacula-2.2.5/etc/bacula-fd.conf root 18454 1 0 17:25 ? 00:00:00 [bacula-dir] root 18456 18454 0 17:25 ? 00:00:00 [bacula-dir] root 18457 18456 0 17:25 ? 00:00:00 [bacula-dir] root 18458 18456 0 17:25 ? 00:00:00 [bacula-dir] root 18460 18418 0 17:25 pts/2 00:00:00 grep bacula [EMAIL PROTECTED] bacula-2.2.5]$ gdb sbin/bacula-sd GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -s -f -c etc/bacula-sd.conf Starting program: /usr/local/bacula-2.2.5/sbin/bacula-sd -s -f -c etc/bacula-sd.conf [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 18507)] [New Thread 32769 (LWP 18537)] [New Thread 16386 (LWP 18538)] [New Thread 32771 (LWP 18540)] [New Thread 49156 (LWP 18832)] [New Thread 65541 (LWP 18841)] [New Thread 81926 (LWP 18854)] [New Thread 98311 (LWP 18863)] [New Thread 114696 (LWP 18865)] [New Thread 131081 (LWP 18874)] [New Thread 147466 (LWP 18930)] [New Thread 163851 (LWP 18933)] [New Thread 180236 (LWP 18961)] [New Thread 196621 (LWP 18970)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 147466 (LWP 18930)] 0x08137148 in ?? () (gdb) thread apply all bt Thread 13 (Thread 180236 (LWP 18961)): #0 0x4003415b in read () from /lib/i686/libpthread.so.0 #1 0x00000004 in ?? () #2 0x00000001 in ?? () #3 0x08072e40 in read_nbytes(BSOCK*, char*, int) (bsock=0x8150160, ptr=0x423e1924 "\002", nbytes=4) at bnet.c:82 #4 0x08074ab2 in BSOCK::recv() (this=0x8150160) at bsock.c:381 #5 0x08072c65 in bget_msg(BSOCK*) (sock=0x8150160) at bget_msg.c:60 #6 0x0804f9c5 in do_append_data(JCR*) (jcr=0x813ef08) at append.c:202 #7 0x0805f5b1 in append_data_cmd (jcr=0x813ef08) at fd_cmds.c:194 #8 0x0805f50f in do_fd_commands(JCR*) (jcr=0x813ef08) at fd_cmds.c:165 #9 0x0805f385 in run_job(JCR*) (jcr=0x813ef08) at fd_cmds.c:128 #10 0x080604f5 in run_cmd(JCR*) (jcr=0x813ef08) at job.c:195 #11 0x0805b1b0 in handle_connection_request(void*) (arg=0x813e498) at dircmd.c:229 #12 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 #13 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 #14 0x4002ef45 in pthread_start_thread_event () from /lib/i686/libpthread.so.0 #15 0x401797fa in clone () from /lib/i686/libc.so.6 Thread 11 (Thread 147466 (LWP 18930)): #0 0x08137148 in ?? () #1 0x08114570 in ?? () #2 0x081156e0 in ?? () #3 0x0806da33 in despool_data (dcr=0x8115260, commit=true) at spool.c:273 #4 0x0806d596 in commit_data_spool(DCR*) (dcr=0x8115260) at spool.c:139 #5 0x0804fd34 in do_append_data(JCR*) (jcr=0x8114570) at append.c:318 #6 0x0805f5b1 in append_data_cmd (jcr=0x8114570) at fd_cmds.c:194 #7 0x0805f50f in do_fd_commands(JCR*) (jcr=0x8114570) at fd_cmds.c:165 ---Type <return> to continue, or q <return> to quit--- #8 0x0805f385 in run_job(JCR*) (jcr=0x8114570) at fd_cmds.c:128 #9 0x080604f5 in run_cmd(JCR*) (jcr=0x8114570) at job.c:195 #10 0x0805b1b0 in handle_connection_request(void*) (arg=0x80c2958) at dircmd.c:229 #11 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 #12 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 #13 0x4002ef45 in pthread_start_thread_event () from /lib/i686/libpthread.so.0 #14 0x401797fa in clone () from /lib/i686/libc.so.6 Thread 9 (Thread 114696 (LWP 18865)): #0 0x400340db in write () from /lib/i686/libpthread.so.0 Thread 7 (Thread 81926 (LWP 18854)): #0 0x080805c7 in sm_sizeof_pool_memory(char const*, int, char*) (fname=0x80c0328 "", lineno=1098783052, obuf=0x80c0328 "") at mem_pool.c:166 #1 0x08072c65 in bget_msg(BSOCK*) (sock=0x80c0328) at bget_msg.c:60 #2 0x0804f892 in do_append_data(JCR*) (jcr=0x80be998) at append.c:154 #3 0x0805f5b1 in append_data_cmd (jcr=0x80be998) at fd_cmds.c:194 #4 0x0805f50f in do_fd_commands(JCR*) (jcr=0x80be998) at fd_cmds.c:165 #5 0x0805f385 in run_job(JCR*) (jcr=0x80be998) at fd_cmds.c:128 #6 0x080604f5 in run_cmd(JCR*) (jcr=0x80be998) at job.c:195 #7 0x0805b1b0 in handle_connection_request(void*) (arg=0x80be3b0) at dircmd.c:229 #8 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 #9 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 #10 0x4002ef45 in pthread_start_thread_event () from /lib/i686/libpthread.so.0 #11 0x401797fa in clone () from /lib/i686/libc.so.6 Current language: auto; currently c++ Thread 5 (Thread 49156 (LWP 18832)): #0 0x4003415b in read () from /lib/i686/libpthread.so.0 #1 0x00000004 in ?? () ---Type <return> to continue, or q <return> to quit--- #2 0x00000001 in ?? () #3 0x08072e40 in read_nbytes(BSOCK*, char*, int) (bsock=0x80b9e78, ptr=0x413d6924 "", nbytes=4) at bnet.c:82 #4 0x08074ab2 in BSOCK::recv() (this=0x80b9e78) at bsock.c:381 #5 0x08072c65 in bget_msg(BSOCK*) (sock=0x80b9e78) at bget_msg.c:60 #6 0x0804f9c5 in do_append_data(JCR*) (jcr=0x80a9380) at append.c:202 #7 0x0805f5b1 in append_data_cmd (jcr=0x80a9380) at fd_cmds.c:194 #8 0x0805f50f in do_fd_commands(JCR*) (jcr=0x80a9380) at fd_cmds.c:165 #9 0x0805f385 in run_job(JCR*) (jcr=0x80a9380) at fd_cmds.c:128 #10 0x080604f5 in run_cmd(JCR*) (jcr=0x80a9380) at job.c:195 #11 0x0805b1b0 in handle_connection_request(void*) (arg=0x80a56a0) at dircmd.c:229 #12 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 #13 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 #14 0x4002ef45 in pthread_start_thread_event () from /lib/i686/libpthread.so.0 #15 0x401797fa in clone () from /lib/i686/libc.so.6 Thread 4 (Thread 32771 (LWP 18540)): #0 0x40034936 in nanosleep () from /lib/i686/libpthread.so.0 #1 0x00000001 in ?? () #2 0x40030b5a in __pthread_timedsuspend_new () from /lib/i686/libpthread.so.0 Thread 2 (Thread 32769 (LWP 18537)): #0 0x4017075a in poll () from /lib/i686/libc.so.6 #1 0x4002dd1a in __pthread_manager () from /lib/i686/libpthread.so.0 #2 0x4002dfea in __pthread_manager_event () from /lib/i686/libpthread.so.0 #3 0x401797fa in clone () from /lib/i686/libc.so.6 Thread 1 (Thread 16384 (LWP 18507)): #0 0x401728d1 in select () from /lib/i686/libc.so.6 ---Type <return> to continue, or q <return> to quit--- #1 0x00000009 in ?? () #2 0x401d3d20 in buffer () from /lib/i686/libc.so.6 #0 0x08137148 in ?? () (gdb) (gdb) quit The program is running. Exit anyway? (y or n) y [EMAIL PROTECTED] bacula-2.2.5]$ exit exit Script done on Thu 11 Oct 2007 05:31:46 PM EDT -------------------------------------------------------------------- Thanks, Mark => => Regards, => => Kern => => On Thursday 04 October 2007 17:11, [EMAIL PROTECTED] wrote: => > I'm testing bacula 2.2.4 and the bacula-sd daemon repeatedly exits with a => > segmentation fault. Bacula 1.38.11 works reliably on the same machine with => > the same hardware and configuration => > files. => > => > Environment: => > Fedora Core 1 => > Kernel 2.4.26 => > gcc 3.3.2 => > MySQL 5.0.22 => > Dell PV 132T autochanger => > => > Build configuration script: => > ./configure \ => > --prefix=/usr/local/bacula-2.2.4 \ => > --disable-nls \ => > --disable-ipv6 \ => > --enable-batch-insert \ => > [EMAIL PROTECTED] \ => > [EMAIL PROTECTED] \ => > --with-db-name=bacula2 \ => > --with-mysql \ => > --mandir=/usr/local/bacula-2.2.4/man \ => > --with-pid-dir=/usr/local/bacula-2.2.4/var/run \ => > --with-subsys-dir=/usr/local/bacula-2.2.4/var/subsys \ => > --with-dir-user=bacula \ => > --with-dir-group=bacula \ => > --with-sd-user=bacula \ => > --with-sd-group=bacula \ => > --with-fd-user=root \ => > --with-fd-group=root && make => > => > => > I'm using the same configuration files for the director, sd, and fd that => > work with the 1.38.11 installation (after removing the "Accept Any Volume" => > directive, changing the paths for 2.2.4, and adding the directive => > "RecyclePool = Scratch"). => > => > The software compiles without error. There were no errors from "btape test" => > or "btape autochanger". => > => > The bacula-sd daemon crashes repeatedly whether or not the variable => > LD_ASSUME_KERNEL was set to "2.4.19" before compiling bacula. => > => > The bacula-sd daemon is running as root (while I sort out an issue that's => > not present in 1.38.11 with permissions on the tape device). The bacula-dir => > normally runs as user "bacula". => > => > Even if I modify the bacula-dir options to run it as root, no traceback => > file is generated. The message (received via email) from Bacula is: => > => > Using host libthread_db library "/lib/libthread_db.so.1". => > 0x401728d1 in ?? () => > /usr/local/bacula-2.2.4/etc/btraceback.gdb:1: Error in sourced command => > file: Cannot access memory at address 0x80a2e34 => > => > The fault seems to occur right after the SD begins despooling data. I've => > got 4 log files from running the SD with debugging on (set to 200 or => > higher), and the error always happens after the first instance of => > despooling data. In each case, the log file shows "stored.c:582 In => > terminate_stored() sig=11". => > => > I've attached an excerpt from the SD debugging output. => > => > => > Thanks, => > => > Mark => > => > => > ---- => > Mark Bergman [EMAIL PROTECTED] => > System Administrator => > Section of Biomedical Image Analysis 215-662-7310 => > Department of Radiology, University of Pennsylvania => > => > http://pgpkeys.pca.dfn.de:11371/pks/lookup?search=mark.bergman%40.uphs.upen => >n.edu => ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users