Good morning.
I've got Bacula 7.0.2 on CENTOS6 and when doing a long backup, resources can be
really high (load average 10+). Trying to cancel a job using bconsole,
bacual-dir crashed with the following trace.
Can anyone give me guidance to see if there is anything I can do and/or change
on my system to prevent these crashes? See trace below.
Thanks,
Marco
[New LWP 20711]
[New LWP 29898]
[New LWP 29657]
[New LWP 29655]
[New LWP 29645]
[New LWP 29644]
[New LWP 2199]
[New LWP 2198]
[New LWP 2138]
[Thread debugging using libthread_db enabled]
0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
$1 = '\000' <repeats 29 times>
$2 = 0x95f068 "bacula-dir"
$3 = 0x95f0a8 "/opt/bacula/bin/bacula-dir"
$4 = 0x7f8cb4025388 "MySQL"
$5 = 0x385c65092c "7.0.2 (02 April 2014)"
$6 = 0x385c65094a "x86_64-redhat-linux-gnu"
$7 = 0x385c650962 "redhat"
$8 = 0x385c6505f5 ""
$9 = "bacula.specialtyvalvegroup.com", '\000' <repeats 19 times>
$10 = 0x385c650942 "redhat "
$11 = 0
Environment variable "TestName" not defined.
#0 0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93
#4 0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>,
m=<value optimized out>, priority=<value optimized out>, f=<value optimized
out>, l=<value optimized out>) at lockmgr.c:435
#5 0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value
optimized out>, line=<value optimized out>) at rwlock.c:228
#6 0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at
res.c:52
#7 0x0000000000434028 in find_runs (one_shot_job_to_run=<value optimized out>)
at scheduler.c:300
#8 wait_for_next_job (one_shot_job_to_run=<value optimized out>) at
scheduler.c:114
#9 0x000000000040eeb5 in main (argc=<value optimized out>, argv=<value
optimized out>) at dird.c:336
Thread 10 (Thread 0x7f8ccd743700 (LWP 2138)):
#0 0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93
#4 0x000000385c6393bd in smalloc (fname=0x385c65501c "lockmgr.c", lineno=603,
nbytes=65) at smartall.c:114
#5 0x000000385c6395f5 in sm_malloc (fname=<value optimized out>, lineno=<value
optimized out>, nbytes=24) at smartall.c:236
#6 0x000000385c648793 in operator new () at ../lib/smartall.h:105
#7 lmgr_detect_deadlock_unlocked () at lockmgr.c:603
#8 0x000000385c6495dd in lmgr_detect_deadlock () at lockmgr.c:661
#9 0x000000385c649b82 in check_deadlock () at lockmgr.c:717
#10 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f8cccd42700 (LWP 2198)):
#0 0x0000003a3eae15e3 in select () from /lib64/libc.so.6
#1 0x000000385c618e44 in bnet_thread_server (addr_list=0x7f8cccd42688,
max_clients=20, client_wq=0x685c40, handle_client_request=0x452f20
<handle_UA_client_request(void*)>) at bnet_server.c:168
#2 0x0000000000452f1c in connect_thread (arg=0x9632d8) at ua_server.c:69
#3 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#4 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f8cc7fff700 (LWP 2199)):
#0 0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93
#4 0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>,
m=<value optimized out>, priority=<value optimized out>, f=<value optimized
out>, l=<value optimized out>) at lockmgr.c:435
#5 0x000000385c6485d6 in bthread_cond_timedwait_p (cond=<value optimized out>,
m=<value optimized out>, abstime=<value optimized out>, file=<value optimized
out>, line=<value optimized out>) at lockmgr.c:977
#6 0x000000385c64279d in watchdog_thread (arg=<value optimized out>) at
watchdog.c:309
#7 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#8 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f8cc6bfd700 (LWP 29644)):
#0 0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0
#1 0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value
optimized out>, nbytes=<value optimized out>) at bnet.c:69
#2 0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at
bsock.c:511
#3 0x0000000000420787 in bget_dirmsg (bs=0x7f8cb80115b8) at getmsg.c:124
#4 0x00000000004116f5 in wait_for_job_termination (jcr=0x9658d8,
timeout=<value optimized out>) at backup.c:630
#5 0x00000000004138ee in do_backup (jcr=0x9658d8) at backup.c:581
#6 0x0000000000427019 in job_thread (arg=0x9658d8) at job.c:303
#7 0x0000000000428293 in jobq_server (arg=0x685940) at jobq.c:439
#8 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#9 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f8cc5ff6700 (LWP 29645)):
#0 0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0
#1 0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value
optimized out>, nbytes=<value optimized out>) at bnet.c:69
#2 0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at
bsock.c:511
#3 0x0000000000420787 in bget_dirmsg (bs=0x7f8cac012da8) at getmsg.c:124
#4 0x00000000004116f5 in wait_for_job_termination (jcr=0x96eeb8,
timeout=<value optimized out>) at backup.c:630
#5 0x00000000004138ee in do_backup (jcr=0x96eeb8) at backup.c:581
#6 0x0000000000427019 in job_thread (arg=0x96eeb8) at job.c:303
#7 0x0000000000428293 in jobq_server (arg=0x685940) at jobq.c:439
#8 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#9 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f8ca97fb700 (LWP 29655)):
#0 0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0
#1 0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value
optimized out>, nbytes=<value optimized out>) at bnet.c:69
#2 0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at
bsock.c:511
#3 0x0000000000420787 in bget_dirmsg (bs=0x7f8cb800b128) at getmsg.c:124
#4 0x000000000042e1aa in msg_thread (arg=0x9658d8) at msgchan.c:427
#5 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#6 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#7 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f8c8ffff700 (LWP 29657)):
#0 0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0
#1 0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value
optimized out>, nbytes=<value optimized out>) at bnet.c:69
#2 0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at
bsock.c:511
#3 0x0000000000420787 in bget_dirmsg (bs=0x7f8cac00b2d8) at getmsg.c:124
#4 0x000000000042e1aa in msg_thread (arg=0x96eeb8) at msgchan.c:427
#5 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#6 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#7 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f8cc75fe700 (LWP 29898)):
#0 0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93
#4 0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>,
m=<value optimized out>, priority=<value optimized out>, f=<value optimized
out>, l=<value optimized out>) at lockmgr.c:435
#5 0x000000385c649024 in bthread_mutex_lock_p (m=<value optimized out>,
file=<value optimized out>, line=<value optimized out>) at lockmgr.c:932
#6 0x0000000000428870 in jobq_server (arg=0x685940) at jobq.c:589
#7 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#8 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f8cab5fe700 (LWP 20711)):
#0 0x0000003a3ee0f2ad in waitpid () from /lib64/libpthread.so.0
#1 0x000000385c638712 in signal_handler (sig=<value optimized out>) at
signal.c:234
#2 <signal handler called>
#3 sm_free (file=0x385c651eb2 "sellist.c", line=138, fp=0x3800000000) at
smartall.c:180
#4 0x000000385c637969 in sellist::set_string (this=0x7f8cab5fd190,
string=0x7f8cb4015850 "2", scan=true) at sellist.c:138
#5 0x000000000043d670 in get_selection_list (ua=0x7f8c9800b998, sl=...,
prompt=<value optimized out>, subprompt=<value optimized out>) at ua_input.c:89
#6 0x0000000000452127 in do_alist_prompt (ua=0x7f8c9800b998, automsg=<value
optimized out>, msg=0x7f8cab5fda20 "Choose Job list to cancel",
selected=0x7f8c9800b3f8) at ua_select.c:957
#7 0x0000000000452af8 in select_running_jobs (ua=0x7f8c9800b998,
jcrs=0x7f8c9800bcf8, reason=0x4659a6 "cancel") at ua_select.c:1341
#8 0x0000000000439068 in cancel_cmd (ua=0x7f8c9800b998, cmd=<value optimized
out>) at ua_cmds.c:443
#9 0x0000000000438bc4 in do_a_command (ua=0x7f8c9800b998) at ua_cmds.c:227
#10 0x0000000000452fce in handle_UA_client_request (arg=0x7f8cc000b0c8) at
ua_server.c:133
#11 0x000000385c642ca2 in workq_server (arg=0x685c40) at workq.c:323
#12 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at
lockmgr.c:1091
#13 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f8cd35d67e0 (LWP 2132)):
#0 0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93
#4 0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>,
m=<value optimized out>, priority=<value optimized out>, f=<value optimized
out>, l=<value optimized out>) at lockmgr.c:435
#5 0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value
optimized out>, line=<value optimized out>) at rwlock.c:228
#6 0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at
res.c:52
#7 0x0000000000434028 in find_runs (one_shot_job_to_run=<value optimized out>)
at scheduler.c:300
#8 wait_for_next_job (one_shot_job_to_run=<value optimized out>) at
scheduler.c:114
#9 0x000000000040eeb5 in main (argc=<value optimized out>, argv=<value
optimized out>) at dird.c:336
#0 0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0 No
symbol table info available.
#1 0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0 No symbol
table info available.
#2 0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 No
symbol table info available.
#3 0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93
93 lockmgr.c: No such file or directory.
in lockmgr.c
errstat = <value optimized out>
#4 0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>,
m=<value optimized out>, priority=<value optimized out>, f=<value optimized
out>, l=<value optimized out>) at lockmgr.c:435
435 in lockmgr.c
max_prio = <value optimized out>
#5 0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value
optimized out>, line=<value optimized out>) at rwlock.c:228
228 rwlock.c: No such file or directory.
in rwlock.c
stat = 0
#6 0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at
res.c:52
52 res.c: No such file or directory.
in res.c
errstat = <value optimized out>
#7 0x0000000000434028 in find_runs (one_shot_job_to_run=<value optimized out>)
at scheduler.c:300
300 LockRes();
hour = 7
nh_woy = 19
now = 1399553525
month = 4
wom = 1
ldom = 30
nh_mday = 7
nh_wday = 4
next_hour = 1399557125
sched = <value optimized out>
tm = {tm_sec = 5, tm_min = 52, tm_hour = 8, tm_mday = 8, tm_mon = 4, tm_year =
114, tm_wday = 4, tm_yday = 127, tm_isdst = 1, tm_gmtoff = -18000, tm_zone =
0x999180 "CDT"} wday = 4 mday = 7 woy = 19 nh_hour = 8 nh_month = 4 nh_wom = 1
runtime = <value optimized out> run = <value optimized out> job = <value
optimized out> nh_ldom = 30
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users