It appears that the pathconf() function is returning incorrect values for _PC_PATH_MAX and _PC_NAME_MAX for your machine. Then the path you are trying to backup exceeds the improper values returned, and Bacula triggers an ASSERT which is causing the FD to stop.
If you know how to run the debugger, please make it stop in "init_find_files" then trace it until the statements: path_max++; name_max++; at that point, I would like to know the value of those two variables. Each should have something like 1000 minimum, and more likely each should be something like 32,000. It would probably be best to file a bug report on this, including your dump. On Tuesday 21 June 2005 20:24, Ray Pengelly wrote: > I have a Solaris 9 x86 box running as a director and sd. The fd on the box > seems to crash every time it runs. Here is the traceback it first sends: > > > > For information about new features see `help changes' > > To remove this message, put `dbxenv suppress_startup_message 7.4' in your > .dbxrc Reading bacula-fd Reading ld.so.1 Reading libz.so.1 Reading > libpthread.so.1 Reading libgen.so.1 Reading libresolv.so.2 Reading > libnsl.so.1 Reading libsocket.so.1 Reading libxnet.so.1 Reading > libstdc++.so.6.0.3 Reading libm.so.1 Reading libgcc_s.so.1 Reading > libc.so.1 Reading libdl.so.1 Reading libmp.so.2 Reading libthread.so.1 > Attached to process 21301 with 4 LWPs > > [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _poll at 0xce81d2ed > > 0xce81d2ed: _poll+0x000c: jae _poll+0x21 [ 0xce81d302, .+0x15 > ] > > Current function is bnet_thread_server > > 154 if ((stat = select(maxfd + 1, &sockset, NULL, NULL, NULL)) < > 0) { > > ******** RUNNING LWPS/THREADS: > >[EMAIL PROTECTED] running in _poll() > > [EMAIL PROTECTED] running in __lwp_park() > > [EMAIL PROTECTED] running in _waitid() > > [EMAIL PROTECTED] running in _poll() > > > > > > ******** STACK TRACE OF CURRENT LWP: > > > > current thread: [EMAIL PROTECTED] > > [1] _poll(0x8046dc0, 0x1, 0xffffffff), at 0xce81d2ed > > [2] _libc_select(0x5, 0x8046ea0, 0xce8a0c7c, 0xce8a0c7c, 0x0), at > 0xce839634 > > [3] _ti_select(0x5, 0x8046ea0, 0x0, 0x0, 0x0, 0x4, 0x8047ae8, 0x8068b70), > at 0xce79da92 =>[4] bnet_thread_server(addrs = (nil), max_clients = 20, > client_wq = 0x809c400, handle_client_request = 0x805b0f0 = > &handle_client_request(void*)), line 154 in "bnet_server.c" > > [5] main(argc = 0, argv = (nil)), line 258 in "filed.c" > > > > > > ******** VARIABLES DUMP OF CURRENT LWP: > > > > sockfds = CLASS > > tlog = 0 > > p = (nil) > > cli_addr = CLASS > > fd_ptr = (nil) > > allbuf = "host[ipv4:0.0.0.0:9102] " > > stat = 0 > > clilen = 16U > > turnon = 1 > > buf = "130.15.106.122" > > newsockfd = 0 > > max_clients = 20 > > addrs = (nil) > > client_wq = 0x809c400 > > handle_client_request = 0x805b0f0 = &handle_client_request(void*) Current > function is bnet_thread_server > > 154 if ((stat = select(maxfd + 1, &sockset, NULL, NULL, NULL)) < > 0) { > > [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _poll at 0xce81d2ed > > 0xce81d2ed: _poll+0x000c: jae _poll+0x21 [ 0xce81d302, .+0x15 > ] > > > > > > ******** STACK TRACE OF LWP 1: > > > > [1] _poll(0x8046dc0, 0x1, 0xffffffff), at 0xce81d2ed > > [2] _libc_select(0x5, 0x8046ea0, 0xce8a0c7c, 0xce8a0c7c, 0x0), at > 0xce839634 > > [3] _ti_select(0x5, 0x8046ea0, 0x0, 0x0, 0x0, 0x4, 0x8047ae8, 0x8068b70), > at 0xce79da92 =>[4] bnet_thread_server(addrs = (nil), max_clients = 20, > client_wq = 0x809c400, handle_client_request = 0x805b0f0 = > &handle_client_request(void*)), line 154 in "bnet_server.c" > > [5] main(argc = 0, argv = (nil)), line 258 in "filed.c" > > > > > > ******** VARIABLES DUMP OF LWP 1: > > > > sockfds = CLASS > > tlog = 0 > > p = (nil) > > cli_addr = CLASS > > fd_ptr = (nil) > > allbuf = "host[ipv4:0.0.0.0:9102] " > > stat = 0 > > clilen = 16U > > turnon = 1 > > buf = "130.15.106.122" > > newsockfd = 0 > > max_clients = 20 > > addrs = (nil) > > client_wq = 0x809c400 > > handle_client_request = 0x805b0f0 = &handle_client_request(void*) Current > function is watchdog_thread > > 289 pthread_cond_timedwait(&timer, &timer_mutex, &timeout); > > [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in __lwp_park at 0xce7a48d0 > > 0xce7a48d0: __lwp_park+0x0010: jae __lwp_park+0x1e [ > 0xce7a48de, .+0xe ] > > > > > > ******** STACK TRACE OF LWP 2: > > > > [1] __lwp_park(0x0, 0xce72df50), at 0xce7a48d0 > > [2] cond_wait_queue(0x8093538, 0x8093548, 0xce72df50, 0x0, 0x0), at > 0xce7a19a4 > > [3] cond_wait_common(0x8093538, 0x8093548, 0xce72df50, 0x0, 0x0), at > 0xce7a1dd9 > > [4] _cond_timedwait(0x8093538, 0x8093548, 0xce72dfb4), at 0xce7a2224 > > [5] cond_timedwait(0x8093538, 0x8093548, 0xce72dfb4), at 0xce7a2264 > > [6] _ti_pthread_cond_timedwait(0x8093538, 0x8093548, 0xce72dfb4, 0x0, > 0x0, 0x0), at 0xce7a22a0 =>[7] watchdog_thread(arg = (nil)), line 289 in > "watchdog.c" > > [8] _thr_setup(0xce770200), at 0xce7a4513 > > [9] _lwp_start(), at 0xce7a4790 > > > > > > ******** VARIABLES DUMP OF LWP 2: > > > > errstat = 0 > > p = (nil) > > timeout = CLASS > > tv = CLASS > > tz = CLASS > > next_time = 0 > > arg = (nil) > > Current function is signal_handler > > 159 waitpid(pid, NULL, 0); /* wait for child to produce dump > */ > > [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _waitid at 0xce81f147 > > 0xce81f147: _waitid+0x000c: jae _waitid+0x25 [ 0xce81f160, .+0x19 > ] > > > > > > ******** STACK TRACE OF LWP 3: > > > > [1] _waitid(0x0, 0x54a5, 0xce61d7c0, 0x3), at 0xce81f147 > > [2] _waitpid(0x54a5, 0x0, 0x0), at 0xce83f087 > > [3] _ti_waitpid(0x54a5), at 0xce79df6f =>[4] signal_handler(sig = 0), > line 159 in "signal.c" > > [5] __sighndlr(0xb, 0x0, 0xce61d9d4, 0x80746e8), at 0xce7a4a8f > > ---- called from signal handler with signal 11 (SIGSEGV) ------ > > [6] find_one_file(jcr = 0x809e2f0, ff_pkt = 0x809e7a0, handle_file = > 0x8061824 = &`bacula-fd`find.c`our_callback(FF_PKT*, void*), pkt = > 0x809e2f0, fname = 0x80b7598 "/DATA2/ianc/My Documents/Matlab Data", > parent_device = 850U, top_level = 0), line 425 in "find_one.c" > > [7] find_one_file(jcr = 0x809e2f0, ff_pkt = (nil), handle_file = > 0x8061824 = &`bacula-fd`find.c`our_callback(FF_PKT*, void*), pkt = > 0x809e2f0, fname = 0x80b7208 "/DATA2/ianc/My Documents", parent_device = 0, > top_level = 0), line 443 in "find_one.c" > > [8] find_one_file(jcr = 0x809e2f0, ff_pkt = (nil), handle_file = > 0x8061824 = &`bacula-fd`find.c`our_callback(FF_PKT*, void*), pkt = > 0x809e2f0, fname = 0x80b20b0 "/DATA2/ianc", parent_device = 0, top_level = > 0), line 443 in "find_one.c" > > [9] find_one_file(jcr = 0x809e2f0, ff_pkt = (nil), handle_file = > 0x8061824 = &`bacula-fd`find.c`our_callback(FF_PKT*, void*), pkt = > 0x809e2f0, fname = 0x809d958 "/DATA2", parent_device = 0, top_level = 1), > line 443 in "find_one.c" > > [10] find_files(jcr = 0x809e2f0, ff = 0x809e7a0, callback = (nil), > his_pkt = 0x809e2f0), line 145 in "find.c" > > [11] blast_data_to_storage_daemon(jcr = (nil), addr = (nil)), line 100 in > "backup.c" > > [12] backup_cmd(jcr = (nil)), line 1332 in "job.c" > > [13] handle_client_request(dirp = 0x809d5c8), line 210 in "job.c" > > [14] workq_server(arg = (nil)), line 347 in "workq.c" > > [15] _thr_setup(0xce770400), at 0xce7a4513 > > [16] _lwp_start(), at 0xce7a4790 > > > > > > ******** VARIABLES DUMP OF LWP 3: > > > > already_dead = 1 > > sig = 0 > > Current function is bnet_wait_data_intr > > 490 switch (select(bsock->fd + 1, &fdset, NULL, NULL, &tv)) { > > [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _poll at 0xce81d2ed > > 0xce81d2ed: _poll+0x000c: jae _poll+0x21 [ 0xce81d302, .+0x15 > ] > > > > > > ******** STACK TRACE OF LWP 4: > > > > [1] _poll(0xce51de4c, 0x1, 0xea60), at 0xce81d2ed > > [2] _libc_select(0x8, 0xce51df14, 0xce8a0c7c, 0xce8a0c7c, 0xce51df0c), at > 0xce839634 > > [3] _ti_select(0x8, 0xce51df14, 0x0, 0x0, 0xce51df0c, 0x0, 0x80, > 0x8074f38), at 0xce79da92 =>[4] bnet_wait_data_intr(bsock = (nil), sec = > 0), line 490 in "bnet.c" > > [5] sd_heartbeat_thread(arg = 0x809e2f0), line 75 in "heartbeat.c" > > [6] _thr_setup(0xce770600), at 0xce7a4513 > > [7] _lwp_start(), at 0xce7a4790 > > > > > > ******** VARIABLES DUMP OF LWP 4: > > > > fdset = CLASS > > tv = CLASS > > bsock = (nil) > > sec = 0 > > dbx: no LWP with id 5 > > dbx: no LWP with id 6 > > dbx: no LWP with id 7 > > dbx: no LWP with id 8 > > detaching from process 21301 > > > > Then after this traceback I get an error message saying: > > > > 21-Jun 06:49 dmstore-dir: No prior Full backup Job record found. > > 21-Jun 06:49 dmstore-dir: No prior or suitable Full backup found. Doing > FULL backup. > > 21-Jun 06:49 dmstore-dir: Start Backup JobId 1714, > Job=DMSTORE-DATA.2005-06-20_01.00.02 > > 21-Jun 14:00 dmstore-dir: DMSTORE-DATA.2005-06-20_01.00.02 Fatal error: > Network error with FD during Backup: ERR=No data available 21-Jun 14:00 > dmstore-sd: DMSTORE-DATA.2005-06-20_01.00.02 Fatal error: append.c:140 > Error reading data header from FD. ERR=No data available 21-Jun 14:00 > dmstore-dir: DMSTORE-DATA.2005-06-20_01.00.02 Fatal error: No Job status > returned from FD. > > 21-Jun 14:00 dmstore-dir: DMSTORE-DATA.2005-06-20_01.00.02 Error: Bacula > 1.36.2 (28Feb05): 21-Jun-2005 14:00:56 > > JobId: 1714 > > Job: DMSTORE-DATA.2005-06-20_01.00.02 > > Backup Level: Full (upgraded from Incremental) > > Client: dmstore-fd > > FileSet: "DMSTORE-DATA" 2005-02-17 02:15:17 > > Pool: "CIHR" > > Storage: "L25" > > Start time: 21-Jun-2005 06:49:30 > > End time: 21-Jun-2005 14:00:56 > > FD Files Written: 0 > > SD Files Written: 86,508 > > FD Bytes Written: 0 > > SD Bytes Written: 123,420,048,243 > > Rate: 0.0 KB/s > > Software Compression: None > > Volume name(s): FTY314S1 > > Volume Session Id: 20 > > Volume Session Time: 1119021116 > > Last Volume Bytes: 281,057,574,558 > > Non-fatal FD errors: 0 > > SD Errors: 0 > > FD termination status: Error > > SD termination status: Error > > Termination: *** Backup Error *** > > > > So I ran the fd with -d500 and here is the output: > > > > dmstore-fd: backup.c:484 Send data to SD len=32768 > > dmstore-fd: backup.c:484 Send data to SD len=32768 > > dmstore-fd: backup.c:484 Send data to SD len=25088 > > dmstore-fd: bfile.c:519 Close file 11 > > dmstore-fd: backup.c:534 bfiled>stored:header 86508 3 0 > > dmstore-fd: find_one.c:116 File ----: /DATA2/ianc/My Documents/Matlab Data > > dmstore-fd: find_one.c:344 Create temp ff packet for dir: /DATA2/ianc/My > Documents/Matlab Data > > dmstore-fd: signal.c:78 sig=11 Segmentation Fault > > Kaboom! bacula-fd, dmstore-fd got signal 11. Attempting traceback. > > Kaboom! exepath=/bacula/bin/ > > dmstore-fd: signal.c:131 Working=/bacula/working > > dmstore-fd: signal.c:132 btpath=/bacula/bin/btraceback > > dmstore-fd: signal.c:133 exepath=/bacula/bin/bacula-fd > > dmstore-fd: signal.c:158 Doing waitpid > > Calling: /bacula/bin/btraceback /bacula/bin/bacula-fd 21301 > > gcore: /bacula/working/bacula-fd.21301 dumped > > Traceback complete, attempting cleanup ... > > dmstore-fd: signal.c:161 Done waitpid > > > > So it seems to be KABOOMING and causing a Segmentation Fault. Any idea what > is going on? The fd then dies and has to be restarted before I can retry > the backup. This happens everytime I run a backup on this client. It was > working previously. > > > > Any help is appreciated. > > > > Ray > > > > > > > > Ray Pengelly > > Computing Technologist > > CIHR Group In Sensory-Motor Systems > > Queen's University > > 613-533-6000 x74139 > > [EMAIL PROTECTED] > > > > "Wouldn't it be great if life were like windows where you could hit > ctrl-alt-delete and start over when things go wrong?" - Anonymous > > > > "How about life being like Unix where they don't mess up in the first > place" - Ray -- Best regards, Kern ("> /\ V_V ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users