Hello Mark, The Seg Fault is occurring inside kernel code when processing the posix_fadvise() function.
The problem is either: 1. Bacula's configure code to detect posix_fadvise() is inadequate for older OSes such as yours. 2. Your kernel is broken 3. 2.4.x kernels have an incompatible posix_fadvise(). On certain systems, I would point the finger at problem #3, but I have never seen incompatible posix calls on a Linux system. There can always be a first case though. In any case, the workaround for you is to manually edit your src/config.h file *after* you do the ./configure, find the line that reads: #define HAVE_POSIX_FADVISE 1 and either delete it or replace it by /* #undef HAVE_POSIX_FADVISE 1 */ then the standard: make ... make install You will need to re-edit the src/config.h file each time you run a ./configure. I would be interested to see the output generated by: man posix_fadvise it might give me some clue why your OS crashes the SD during that call. By the way, the FD uses the same calls so it is also susceptible to crashing as well. Turning off the HAVE_POSIX_FADVISE should eliminate all the calls in the Bacula code and correct the problem in both daemons. Turning off these calls will have no effect on how Bacula works, but you won't be able to take advantage of the performance improvements they provide. Basically they prevent Bacula from filling all memory with stale data and forcing swapping by advising the system when Bacula will no longer use I/O data that the OS has cached. Best regards, Kern On Thursday 11 October 2007 23:39, [EMAIL PROTECTED] wrote: > In the message dated: Tue, 09 Oct 2007 15:55:58 +0200, > The pithy ruminations from Kern Sibbald on > <Re: [Bacula-devel] bacula-sd 2.2.4 goes kaboom! (segfault on despooling > data)> were: > => Hello, > > > Since 2.2.5 has been released, I've built that version, and I'm seeing the > same behavior. > > => > => Most likely you forgot to put in an Autochanger resource for controlling > your > > Nope. The same configuration works fine with the autochanger under 1.38.11. > Here are excerpts from the config files: > > -------------- bacula-dir.conf ---------------------- > # Definition of file storage device > Storage { > Name = pv132t > # Do not use "localhost" here > Address = parthenon # N.B. Use a fully qualified name here > SDPort = 9103 > Password = "IeRUDo7djGxxxxxxxxxxxxxxxxxxxxxxxxxlM1xUkbrEJnq4K" > Device = pv132t > Media Type = LTO2 > Autochanger = yes > Maximum Concurrent Jobs = 25 > } > ----------------------------------------------------- > > > > --------------- bacula-sd.conf ---------------------- > # > # An autochanger device with two drives > # > Autochanger { > Name = pv132t > Device = Drive-0 > Device = Drive-1 > Changer Command = "/usr/local/bacula-2.2.5/bin/mtx-changer %c %o %S %a > %d" Changer Device = /dev/changer > } > > Device { > Name = Drive-0 # > LabelMedia = yes; # lets Bacula label unlabeled media > Drive Index = 0 > Media Type = LTO2 > Archive Device = /dev/tape0 > AutomaticMount = yes; # when device opened, read it > AlwaysOpen = yes; > RemovableMedia = yes; > RandomAccess = no; > AutoChanger = yes > # Enable the Alert command only if you have the mtx package loaded > Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'" > Maximum Spool Size = 60G # For network client > Maximum Network Buffer Size = 65536 > Spool Directory = /san3/var/spool/bacula > Autoselect = yes > } > > Device { > Name = Drive-1 # > LabelMedia = yes; # lets Bacula label unlabeled media > Drive Index = 1 > Media Type = LTO2 > Archive Device = /dev/tape1 > AutomaticMount = yes; # when device opened, read it > AlwaysOpen = yes; > RemovableMedia = yes; > RandomAccess = no; > AutoChanger = yes > # Enable the Alert command only if you have the mtx package loaded > Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'" > Maximum Spool Size = 60G # For network client > Maximum Network Buffer Size = 65536 > Spool Directory = /san2/var/spool/bacula > Autoselect = yes > } > ----------------------------------------------------- > => autochanger. Barring that, you will need to run the SD under the > debugger as => defined in the Kaboom chapter of the manual and obtain a > good traceback. > > > Done. Here's a script(1) of the gdb output. Please let me know if there's > any more information that I can provide, or any alternative way of testing > this. > > -------------------------------------------------------------------- > Script started on Thu 11 Oct 2007 05:25:41 PM EDT > [EMAIL PROTECTED] bacula-2.2.5]$ ./test_bacula.rc start > Starting the Storage daemon > Starting the File daemon > Starting the Director daemon > [EMAIL PROTECTED] bacula-2.2.5]$ ps -ef | grep bacula > root 18416 18385 0 17:25 pts/24 00:00:00 script bacula-debugging > root 18417 18416 0 17:25 pts/24 00:00:00 script bacula-debugging > root 18449 1 0 17:25 ? 00:00:00 > /usr/local/bacula-2.2.5/sbin/bacula-fd -u root -g root -v -c > /usr/local/bacula-2.2.5/etc/bacula-fd.conf root 18451 18449 0 17:25 ? > 00:00:00 /usr/local/bacula-2.2.5/sbin/bacula-fd -u root -g root -v -c > /usr/local/bacula-2.2.5/etc/bacula-fd.conf root 18452 18451 0 17:25 ? > 00:00:00 /usr/local/bacula-2.2.5/sbin/bacula-fd -u root -g root -v -c > /usr/local/bacula-2.2.5/etc/bacula-fd.conf root 18454 1 0 17:25 ? > 00:00:00 [bacula-dir] > root 18456 18454 0 17:25 ? 00:00:00 [bacula-dir] > root 18457 18456 0 17:25 ? 00:00:00 [bacula-dir] > root 18458 18456 0 17:25 ? 00:00:00 [bacula-dir] > root 18460 18418 0 17:25 pts/2 00:00:00 grep bacula > [EMAIL PROTECTED] bacula-2.2.5]$ gdb sbin/bacula-sd > GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh) > Copyright 2003 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are welcome to change it and/or distribute copies of it under certain > conditions. Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-redhat-linux-gnu"...Using host > libthread_db library "/lib/libthread_db.so.1". > > (gdb) run -s -f -c etc/bacula-sd.conf > Starting program: /usr/local/bacula-2.2.5/sbin/bacula-sd -s -f -c > etc/bacula-sd.conf [Thread debugging using libthread_db enabled] > [New Thread 16384 (LWP 18507)] > [New Thread 32769 (LWP 18537)] > [New Thread 16386 (LWP 18538)] > [New Thread 32771 (LWP 18540)] > [New Thread 49156 (LWP 18832)] > [New Thread 65541 (LWP 18841)] > [New Thread 81926 (LWP 18854)] > [New Thread 98311 (LWP 18863)] > [New Thread 114696 (LWP 18865)] > [New Thread 131081 (LWP 18874)] > [New Thread 147466 (LWP 18930)] > [New Thread 163851 (LWP 18933)] > [New Thread 180236 (LWP 18961)] > [New Thread 196621 (LWP 18970)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 147466 (LWP 18930)] > 0x08137148 in ?? () > (gdb) thread apply all bt > > Thread 13 (Thread 180236 (LWP 18961)): > #0 0x4003415b in read () from /lib/i686/libpthread.so.0 > #1 0x00000004 in ?? () > #2 0x00000001 in ?? () > #3 0x08072e40 in read_nbytes(BSOCK*, char*, int) (bsock=0x8150160, > ptr=0x423e1924 "\002", nbytes=4) at bnet.c:82 > #4 0x08074ab2 in BSOCK::recv() (this=0x8150160) at bsock.c:381 > #5 0x08072c65 in bget_msg(BSOCK*) (sock=0x8150160) at bget_msg.c:60 > #6 0x0804f9c5 in do_append_data(JCR*) (jcr=0x813ef08) at append.c:202 > #7 0x0805f5b1 in append_data_cmd (jcr=0x813ef08) at fd_cmds.c:194 > #8 0x0805f50f in do_fd_commands(JCR*) (jcr=0x813ef08) at fd_cmds.c:165 > #9 0x0805f385 in run_job(JCR*) (jcr=0x813ef08) at fd_cmds.c:128 > #10 0x080604f5 in run_cmd(JCR*) (jcr=0x813ef08) at job.c:195 > #11 0x0805b1b0 in handle_connection_request(void*) (arg=0x813e498) at > dircmd.c:229 #12 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 > #13 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 > #14 0x4002ef45 in pthread_start_thread_event () from > /lib/i686/libpthread.so.0 #15 0x401797fa in clone () from > /lib/i686/libc.so.6 > > Thread 11 (Thread 147466 (LWP 18930)): > #0 0x08137148 in ?? () > #1 0x08114570 in ?? () > #2 0x081156e0 in ?? () > #3 0x0806da33 in despool_data (dcr=0x8115260, commit=true) at spool.c:273 > #4 0x0806d596 in commit_data_spool(DCR*) (dcr=0x8115260) at spool.c:139 > #5 0x0804fd34 in do_append_data(JCR*) (jcr=0x8114570) at append.c:318 > #6 0x0805f5b1 in append_data_cmd (jcr=0x8114570) at fd_cmds.c:194 > #7 0x0805f50f in do_fd_commands(JCR*) (jcr=0x8114570) at fd_cmds.c:165 > ---Type <return> to continue, or q <return> to quit--- > #8 0x0805f385 in run_job(JCR*) (jcr=0x8114570) at fd_cmds.c:128 > #9 0x080604f5 in run_cmd(JCR*) (jcr=0x8114570) at job.c:195 > #10 0x0805b1b0 in handle_connection_request(void*) (arg=0x80c2958) at > dircmd.c:229 #11 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 > #12 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 > #13 0x4002ef45 in pthread_start_thread_event () from > /lib/i686/libpthread.so.0 #14 0x401797fa in clone () from > /lib/i686/libc.so.6 > > Thread 9 (Thread 114696 (LWP 18865)): > #0 0x400340db in write () from /lib/i686/libpthread.so.0 > > Thread 7 (Thread 81926 (LWP 18854)): > #0 0x080805c7 in sm_sizeof_pool_memory(char const*, int, char*) > (fname=0x80c0328 "", lineno=1098783052, obuf=0x80c0328 "") at > mem_pool.c:166 > #1 0x08072c65 in bget_msg(BSOCK*) (sock=0x80c0328) at bget_msg.c:60 > #2 0x0804f892 in do_append_data(JCR*) (jcr=0x80be998) at append.c:154 > #3 0x0805f5b1 in append_data_cmd (jcr=0x80be998) at fd_cmds.c:194 > #4 0x0805f50f in do_fd_commands(JCR*) (jcr=0x80be998) at fd_cmds.c:165 > #5 0x0805f385 in run_job(JCR*) (jcr=0x80be998) at fd_cmds.c:128 > #6 0x080604f5 in run_cmd(JCR*) (jcr=0x80be998) at job.c:195 > #7 0x0805b1b0 in handle_connection_request(void*) (arg=0x80be3b0) at > dircmd.c:229 #8 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 > #9 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 > #10 0x4002ef45 in pthread_start_thread_event () from > /lib/i686/libpthread.so.0 #11 0x401797fa in clone () from > /lib/i686/libc.so.6 > Current language: auto; currently c++ > > Thread 5 (Thread 49156 (LWP 18832)): > #0 0x4003415b in read () from /lib/i686/libpthread.so.0 > #1 0x00000004 in ?? () > ---Type <return> to continue, or q <return> to quit--- > #2 0x00000001 in ?? () > #3 0x08072e40 in read_nbytes(BSOCK*, char*, int) (bsock=0x80b9e78, > ptr=0x413d6924 "", nbytes=4) at bnet.c:82 > #4 0x08074ab2 in BSOCK::recv() (this=0x80b9e78) at bsock.c:381 > #5 0x08072c65 in bget_msg(BSOCK*) (sock=0x80b9e78) at bget_msg.c:60 > #6 0x0804f9c5 in do_append_data(JCR*) (jcr=0x80a9380) at append.c:202 > #7 0x0805f5b1 in append_data_cmd (jcr=0x80a9380) at fd_cmds.c:194 > #8 0x0805f50f in do_fd_commands(JCR*) (jcr=0x80a9380) at fd_cmds.c:165 > #9 0x0805f385 in run_job(JCR*) (jcr=0x80a9380) at fd_cmds.c:128 > #10 0x080604f5 in run_cmd(JCR*) (jcr=0x80a9380) at job.c:195 > #11 0x0805b1b0 in handle_connection_request(void*) (arg=0x80a56a0) at > dircmd.c:229 #12 0x0808788c in workq_server (arg=0x80a3900) at workq.c:357 > #13 0x4002edb2 in pthread_start_thread () from /lib/i686/libpthread.so.0 > #14 0x4002ef45 in pthread_start_thread_event () from > /lib/i686/libpthread.so.0 #15 0x401797fa in clone () from > /lib/i686/libc.so.6 > > Thread 4 (Thread 32771 (LWP 18540)): > #0 0x40034936 in nanosleep () from /lib/i686/libpthread.so.0 > #1 0x00000001 in ?? () > #2 0x40030b5a in __pthread_timedsuspend_new () from > /lib/i686/libpthread.so.0 > > Thread 2 (Thread 32769 (LWP 18537)): > #0 0x4017075a in poll () from /lib/i686/libc.so.6 > #1 0x4002dd1a in __pthread_manager () from /lib/i686/libpthread.so.0 > #2 0x4002dfea in __pthread_manager_event () from /lib/i686/libpthread.so.0 > #3 0x401797fa in clone () from /lib/i686/libc.so.6 > > Thread 1 (Thread 16384 (LWP 18507)): > #0 0x401728d1 in select () from /lib/i686/libc.so.6 > ---Type <return> to continue, or q <return> to quit--- > #1 0x00000009 in ?? () > #2 0x401d3d20 in buffer () from /lib/i686/libc.so.6 > #0 0x08137148 in ?? () > (gdb) > (gdb) quit > The program is running. Exit anyway? (y or n) y > [EMAIL PROTECTED] bacula-2.2.5]$ exit > exit > > Script done on Thu 11 Oct 2007 05:31:46 PM EDT > -------------------------------------------------------------------- > > Thanks, > > Mark > > => > => Regards, > => > => Kern > => > => On Thursday 04 October 2007 17:11, [EMAIL PROTECTED] wrote: > => > I'm testing bacula 2.2.4 and the bacula-sd daemon repeatedly exits > with a => > segmentation fault. Bacula 1.38.11 works reliably on the same > machine with => > the same hardware and configuration > => > files. > => > > => > Environment: > => > Fedora Core 1 > => > Kernel 2.4.26 > => > gcc 3.3.2 > => > MySQL 5.0.22 > => > Dell PV 132T autochanger > => > > => > Build configuration script: > => > ./configure \ > => > --prefix=/usr/local/bacula-2.2.4 \ > => > --disable-nls \ > => > --disable-ipv6 \ > => > --enable-batch-insert \ > => > [EMAIL PROTECTED] \ > => > [EMAIL PROTECTED] \ > => > --with-db-name=bacula2 \ > => > --with-mysql \ > => > --mandir=/usr/local/bacula-2.2.4/man \ > => > --with-pid-dir=/usr/local/bacula-2.2.4/var/run \ > => > --with-subsys-dir=/usr/local/bacula-2.2.4/var/subsys \ > => > --with-dir-user=bacula \ > => > --with-dir-group=bacula \ > => > --with-sd-user=bacula \ > => > --with-sd-group=bacula \ > => > --with-fd-user=root \ > => > --with-fd-group=root && make > => > > => > > => > I'm using the same configuration files for the director, sd, and fd > that => > work with the 1.38.11 installation (after removing the "Accept > Any Volume" => > directive, changing the paths for 2.2.4, and adding the > directive => > "RecyclePool = Scratch"). > => > > => > The software compiles without error. There were no errors from "btape > test" => > or "btape autochanger". > => > > => > The bacula-sd daemon crashes repeatedly whether or not the variable > => > LD_ASSUME_KERNEL was set to "2.4.19" before compiling bacula. > => > > => > The bacula-sd daemon is running as root (while I sort out an issue > that's => > not present in 1.38.11 with permissions on the tape device). > The bacula-dir => > normally runs as user "bacula". > => > > => > Even if I modify the bacula-dir options to run it as root, no > traceback => > file is generated. The message (received via email) from > Bacula is: => > > => > Using host libthread_db library "/lib/libthread_db.so.1". > => > 0x401728d1 in ?? () > => > /usr/local/bacula-2.2.4/etc/btraceback.gdb:1: Error in sourced > command => > file: Cannot access memory at address 0x80a2e34 > => > > => > The fault seems to occur right after the SD begins despooling data. > I've => > got 4 log files from running the SD with debugging on (set to 200 > or => > higher), and the error always happens after the first instance of > => > despooling data. In each case, the log file shows "stored.c:582 In => > > terminate_stored() sig=11". > => > > => > I've attached an excerpt from the SD debugging output. > => > > => > > => > Thanks, > => > > => > Mark > => > > => > > => > ---- > => > Mark Bergman [EMAIL PROTECTED] > => > System Administrator > => > Section of Biomedical Image Analysis 215-662-7310 > => > Department of Radiology, University of Pennsylvania > => > > => > > http://pgpkeys.pca.dfn.de:11371/pks/lookup?search=mark.bergman%40.uphs.upen > => >n.edu > => ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users