Package: boinc-client Version: 6.13.10+dfsg-1 Severity: important /usr/bin/boinc crashes (SIGSEGV) after some hours of computation. This happens every day, so it is reproducible. What follows is a journey to track this down:
My recent upgrades of boinc-related packages include: Mon, Oct 24 2011 20:09:36 +0200 [UPGRADE] boinc 6.13.1+dfsg-2 -> 6.13.6+dfsg-2 [UPGRADE] boinc-client 6.13.1+dfsg-2 -> 6.13.6+dfsg-2 [UPGRADE] boinc-manager 6.13.1+dfsg-2 -> 6.13.6+dfsg-2 Tue, Nov 1 2011 10:43:55 +0100 [UPGRADE] boinc 6.13.6+dfsg-2 -> 6.13.10+dfsg-1 [UPGRADE] boinc-client 6.13.6+dfsg-2 -> 6.13.10+dfsg-1 [UPGRADE] boinc-manager 6.13.6+dfsg-2 -> 6.13.10+dfsg-1 I've noticed the missing system load (and hereby the broken boinc client) only some days ago, so I suspect the bug was introduced by either boinc-client version 6.13.10+dfsg-1 or 6.13.6+dfsg-2. I'm sure this never happened with version 6.13.1+dfsg-2. For the record, I'm currently running these projects: - climateprediction.net - rosetta@home - SETI@home I was never attached using boinc-manager or any other client, and I did not change the configuration of any boinc-related software in the past month. I also did not add or remove projects. ----------------------------------------------------------- On to the bug hunting: /var/lib/boinc-client/std{out,err}dae.txt do not contain anything interesting leading to this crash. So I went on and ran boinc under gdb, in the environment as created by the init script. The following crash happened after 3 hours of computation. # apt-get install boinc-dbg # sudo -u boinc -H -- sh -c 'cd ~ && gdb -q -s /usr/lib/debug/usr/bin/boinc -e /usr/bin/boinc' Reading symbols from /usr/lib/debug/usr/bin/boinc...done. (gdb) run --check_all_logins --redirectio --dir /var/lib/boinc-client Starting program: /usr/bin/boinc --check_all_logins --redirectio --dir /var/lib/boinc-client [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. __mempcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:436 436 ../sysdeps/x86_64/memcpy.S: No such file or directory. in ../sysdeps/x86_64/memcpy.S (gdb) bt 10 #0 __mempcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:436 #1 0x00007ffff6276d66 in _IO_default_xsputn (f=0x7ffffffedb80, data=<optimized out>, n=150842) at genops.c:468 #2 0x00007ffff6249f04 in _IO_vfprintf_internal (s=0x7ffffffedb80, format=0x4a5ec0 "<scheduler_request>\n <authenticator>%s</authenticator>\n <hostid>%d</hostid>\n <rpc_seqno>%d</rpc_seqno>\n <core_client_major_version>%d</core_client_major_version>\n <core_client_minor_ver"..., ap=0x7ffffffedca0) at vfprintf.c:1620 #3 0x00007ffff626c3b9 in __IO_vsprintf ( string=0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., format=0x4a5ec0 "<scheduler_request>\n <authenticator>%s</authenticator>\n <hostid>%d</hostid>\n <rpc_seqno>%d</rpc_seqno>\n <core_client_major_version>%d</core_client_major_version>\n <core_client_minor_ver"..., args=0x7ffffffedca0) at iovsprintf.c:43 #4 0x00007ffff62532f8 in __sprintf (s=0x7fffffffefe0 "L*W53'G?R8ALXQ]?<\"TT$^2.&3REI]0W" <Address 0x7ffffffff000 out of bounds>, format=0x7ffff7fba050 "L*W53'G?R8ALXQ]?<\"TT$^2.&3REI]0WIP,']_K^-2]C=SQF;M\"XB'\\-('\n:]'0L-?/P^1N,V-5L8R1>YXA( W@+ZJF0'.?-*CF5L0\"*IJP@7:/H$E>M</EE#67\nI\\WA[#6?4\\WY'Z^?LUAKSY:!/J&9:_6R2TZ,ED'RM@Y\\G-]Q!DWW1!.FW#).F\n: O#_9L'[76"...) at sprintf.c:34 #5 0x000000000045e189 in trickle_up_request_message ( buf=0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., t=1320089068, result_name=0x7fffffffdee0 "\\C)1*3M4 $V1D60'GS'4%\">(>+)M'=\\5#!X/Z,?AWY:2#^65>%\\.7+8\n\\7Y(+D6F:N-U*6 !3[452\\NBC62?W8J+<2<L>K_&*=!+ I9&U '9EC&F6'%SNM)Z\n$IV+P=X@?]10&:\\+(D$%099=N 1,QK)XA)M'4>6+$?$X#IO>\"^<6N3/9^\\U[*Y-(\nUW92Z18_3H2XQC"..., msg=0x7ffff7fa9010 "<variety>year</variety>\n<wu>hadcm3n_ymgx_1900_40_007524528</wu>\n<result>hadcm3n_ymgx_1900_40_007524528_1</result>\n<ph>1</ph>\n<ts>25920</ts>\n<cp>46188</cp>\n<vr>6.07</vr>\n<ppname>\ntrickle_hadcm3n_ymgx_1"..., p=0x72dad0) at cs_trickle.cpp:195 #6 send_replicated_trickles (p=0x72dad0, msg=0x7ffff7fa9010 "<variety>year</variety>\n<wu>hadcm3n_ymgx_1900_40_007524528</wu>\n<result>hadcm3n_ymgx_1900_40_007524528_1</result>\n<ph>1</ph>\n<ts>25920</ts>\n<cp>46188</cp>\n<vr>6.07</vr>\n<ppname>\ntrickle_hadcm3n_ymgx_1"..., result_name=0x7fffffffdee0 "\\C)1*3M4 $V1D60'GS'4%\">(>+)M'=\\5#!X/Z,?AWY:2#^65>%\\.7+8\n\\7Y(+D6F:N-U*6 !3[452\\NBC62?W8J+<2<L>K_&*=!+ I9&U '9EC&F6'%SNM)Z\n$IV+P=X@?]10&:\\+(D$%099=N 1,QK)XA)M'4>6+$?$X#IO>\"^<6N3/9^\\U[*Y-(\nUW92Z18_3H2XQC"..., now=1320089068) at cs_trickle.cpp:202 #7 0x000000000000000d in ?? () #8 0x0000000000000063 in ?? () #9 0x00000000007240f8 in ?? () (More stack frames follow...) Note: there are thousands of following stack frames without matching symbols... (gdb) bt full 10 #0 __mempcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:436 No locals. #1 0x00007ffff6276d66 in _IO_default_xsputn (f=0x7ffffffedb80, data=<optimized out>, n=150842) at genops.c:468 count = 150842 s = 0x7ffff7fcdd4a "" more = 150842 #2 0x00007ffff6249f04 in _IO_vfprintf_internal (s=0x7ffffffedb80, format=0x4a5ec0 "<scheduler_request>\n <authenticator>%s</authenticator>\n <hostid>%d</hostid>\n <rpc_seqno>%d</rpc_seqno>\n <core_client_major_version>%d</core_client_major_version>\n <core_client_minor_ver"..., ap=0x7ffffffedca0) at vfprintf.c:1620 len = <optimized out> string_malloced = 1163014950 step0_jumps = {0, -15179, -14483, -14390, -14293, -14199, -15032, -14766, -13119, -13894, -13811, -13241, -12583, -2939, -3410, -3367, -3217, -3202, -10653, -8358, -2123, -12480, -2322, -2771, -1385, -1830, -2607, -3126, -3022, -14856} space = 0 is_short = 0 use_outdigits = 0 step1_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, -13894, -13811, -13241, -12583, -2939, -3410, -3367, -3217, -3202, -10653, -8358, -2123, -12480, -2322, -2771, -1385, -1830, -2607, -3126, -3022, 0} group = 0 prec = <optimized out> step2_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -13811, -13241, -12583, -2939, -3410, -3367, -3217, -3202, -10653, -8358, -2123, -12480, -2322, -2771, -1385, -1830, -2607, -3126, -3022, 0} string = 0x2e325e245454223c <Address 0x2e325e245454223c out of bounds> left = 0 is_long_double = 0 width = <optimized out> step3a_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -13341, 0, 0, 0, -3410, -3367, -3217, -3202, -10653, 0, 0, 0, 0, -2771, 0, 0, 0, 0, 0, 0} alt = 0 showsign = 0 is_long = 0 is_char = 0 pad = <optimized out> step3b_jumps = {0 <repeats 11 times>, -12583, 0, 0, -3410, -3367, -3217, -3202, -10653, -8358, -2123, -12480, -2322, -2771, -1385, -1830, -2607, 0, 0, 0} step4_jumps = {0 <repeats 14 times>, -3410, -3367, -3217, -3202, -10653, -8358, -2123, -12480, -2322, -2771, -1385, -1830, -2607, 0, 0, 0} is_negative = <optimized out> base = 10 the_arg = {pa_wchar = 0 L'\000', pa_int = 0, pa_long_int = 0, pa_long_long_int = 0, pa_u_int = 0, pa_u_long_int = 0, pa_u_long_long_int = 0, pa_double = 0, pa_long_double = 0, pa_string = 0x0, pa_wstring = 0x0, pa_pointer = 0x0, pa_user = 0x0} spec = <optimized out> _buffer = {__routine = 0, __arg = 0x0, __canceltype = 0, __prev = 0x0} _avail = 0 thousands_sep = 0x0 grouping = 0xffffffffffffffff <Address 0xffffffffffffffff out of bounds> done = 496 f = <optimized out> lead_str_end = 0x4a5ee7 "%s</authenticator>\n <hostid>%d</hostid>\n <rpc_seqno>%d</rpc_seqno>\n <core_client_major_version>%d</core_client_major_version>\n <core_client_minor_version>%d</core_client_minor_version>\n "... work_buffer = '\000' <repeats 990 times>, "1320089068" workstart = 0x0 workend = 0x7ffffffeda38 "" ap_save = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7ffffffedd80, reg_save_area = 0x7ffffffedcc0}} nspecs_done = <optimized out> save_errno = 0 readonly_format = 0 jump_table = "\001\000\000\004\000\016\000\006\000\000\a\002\000\003\t\000\005\b\b\b\b\b\b\b\b\b\000\000\000\000\000\000\000\032\000\031\000\023\023\023\000\035\000\000\f\000\000\000\000\000\000\025\000\000\000\000\022\000\r\000\000\000\000\000\000\032\000\024\017\023\023\023\n\017\034\000\v\030\027\021\026\f\000\025\033\020\000\000\022\000\r" __PRETTY_FUNCTION__ = "_IO_vfprintf_internal" #3 0x00007ffff626c3b9 in __IO_vsprintf ( string=0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., format=0x4a5ec0 "<scheduler_request>\n <authenticator>%s</authenticator>\n <hostid>%d</hostid>\n <rpc_seqno>%d</rpc_seqno>\n <core_client_major_version>%d</core_client_major_version>\n <core_client_minor_ver"..., args=0x7ffffffedca0) at iovsprintf.c:43 sf = {_sbf = {_f = {_flags = -72515583, _IO_read_ptr = 0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., _IO_read_end = 0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., _IO_read_base = 0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., _IO_write_base = 0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., _IO_write_ptr = 0x7ffffffedfa0 "<variety>year</variety>\n<wu>hadcm3n_ymgx_1900_40_007524528</wu>\n<result>hadcm3n_ymgx_1900_40_007524528_1</result>\n<ph>1</ph>\n<ts>25920</ts>\n<cp>46188</cp>\n<vr>6.07</vr>\n<ppname>\ntrickle_hadcm3n_ymgx_1"..., _IO_write_end = 0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, _IO_buf_base = 0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., _IO_buf_end = 0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, _IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0, _markers = 0x0, _chain = 0x0, _fileno = 0, _flags2 = 0, _old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000', _shortbuf = "", _lock = 0x0, _offset = 0, _codecvt = 0x0, _wide_data = 0x0, _freeres_list = 0x0, _freeres_buf = 0x0, _freeres_size = 0, _mode = -1, _unused2 = '\000' <repeats 19 times>}, vtable = 0x7ffff6583740}, _s = {_allocate_buffer = 0, _free_buffer = 0}} ret = <optimized out> ---Type <return> to continue, or q <return> to quit--- #4 0x00007ffff62532f8 in __sprintf (s=0x7fffffffefe0 "L*W53'G?R8ALXQ]?<\"TT$^2.&3REI]0W" <Address 0x7ffffffff000 out of bounds>, format=0x7ffff7fba050 "L*W53'G?R8ALXQ]?<\"TT$^2.&3REI]0WIP,']_K^-2]C=SQF;M\"XB'\\-('\n:]'0L-?/P^1N,V-5L8R1>YXA( W@+ZJF0'.?-*CF5L0\"*IJP@7:/H$E>M</EE#67\nI\\WA[#6?4\\WY'Z^?LUAKSY:!/J&9:_6R2TZ,ED'RM@Y\\G-]Q!DWW1!.FW#).F\n: O#_9L'[76"...) at sprintf.c:34 arg = {{gp_offset = 48, fp_offset = 48, overflow_arg_area = 0x7ffffffeddb0, reg_save_area = 0x7ffffffedcc0}} done = 894904908 #5 0x000000000045e189 in trickle_up_request_message ( buf=0x7ffffffeddb0 "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"..., t=1320089068, result_name=0x7fffffffdee0 "\\C)1*3M4 $V1D60'GS'4%\">(>+)M'=\\5#!X/Z,?AWY:2#^65>%\\.7+8\n\\7Y(+D6F:N-U*6 !3[452\\NBC62?W8J+<2<L>K_&*=!+ I9&U '9EC&F6'%SNM)Z\n$IV+P=X@?]10&:\\+(D$%099=N 1,QK)XA)M'4>6+$?$X#IO>\"^<6N3/9^\\U[*Y-(\nUW92Z18_3H2XQC"..., msg=0x7ffff7fa9010 "<variety>year</variety>\n<wu>hadcm3n_ymgx_1900_40_007524528</wu>\n<result>hadcm3n_ymgx_1900_40_007524528_1</result>\n<ph>1</ph>\n<ts>25920</ts>\n<cp>46188</cp>\n<vr>6.07</vr>\n<ppname>\ntrickle_hadcm3n_ymgx_1"..., p=0x72dad0) at cs_trickle.cpp:195 No locals. #6 send_replicated_trickles (p=0x72dad0, msg=0x7ffff7fa9010 "<variety>year</variety>\n<wu>hadcm3n_ymgx_1900_40_007524528</wu>\n<result>hadcm3n_ymgx_1900_40_007524528_1</result>\n<ph>1</ph>\n<ts>25920</ts>\n<cp>46188</cp>\n<vr>6.07</vr>\n<ppname>\ntrickle_hadcm3n_ymgx_1"..., result_name=0x7fffffffdee0 "\\C)1*3M4 $V1D60'GS'4%\">(>+)M'=\\5#!X/Z,?AWY:2#^65>%\\.7+8\n\\7Y(+D6F:N-U*6 !3[452\\NBC62?W8J+<2<L>K_&*=!+ I9&U '9EC&F6'%SNM)Z\n$IV+P=X@?]10&:\\+(D$%099=N 1,QK)XA)M'4>6+$?$X#IO>\"^<6N3/9^\\U[*Y-(\nUW92Z18_3H2XQC"..., now=1320089068) at cs_trickle.cpp:202 buf = "<scheduler_request>\n <authenticator>4a31871bf59efd4895a7ca5a65402602</authenticator>\n <hostid>1171553</hostid>\n <rpc_seqno>122</rpc_seqno>\n <core_client_major_version>6</core_client_major_"... #7 0x000000000000000d in ?? () No symbol table info available. #8 0x0000000000000063 in ?? () No symbol table info available. #9 0x00000000007240f8 in ?? () No symbol table info available. (More stack frames follow...) Sorry I'm not familiar with any of the boinc code, so I leave the interpretation of this data to the boinc programmers. Please let me know if you need more. I've dumped the core and am able to inspect additional stuff (or send the core along with the binary containing all symbols to interested parties via private email). Hope this helps, John -- Package-specific info: -- Contents of /etc/default/boinc-client: # This file is /etc/default/boinc-client, it is a configuration file for the # /etc/init.d/boinc-client init script. # Set this to 1 to enable and to 0 to disable the init script. ENABLED="1" # Set this to 1 to enable advanced scheduling of the BOINC core client and # all its sub-processes (reduces the impact of BOINC on the system's # performance). SCHEDULE="1" # The BOINC core client will be started with the permissions of this user. BOINC_USER="boinc" # This is the data directory of the BOINC core client. BOINC_DIR="/var/lib/boinc-client" # This is the location of the BOINC core client, that the init script uses. # If you do not want to use the client program provided by the boinc-client # package, you can specify here an alternative client program. #BOINC_CLIENT="/usr/local/bin/boinc" BOINC_CLIENT="/usr/bin/boinc" # Here you can specify additional options to pass to the BOINC core client. # Type 'boinc --help' or 'man boinc' for a full summary of allowed options. #BOINC_OPTS="--allow_remote_gui_rpc" BOINC_OPTS="" -- System Information: Debian Release: wheezy/sid APT prefers unstable APT policy: (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 3.0.0-1 (SMP w/4 CPU cores; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages boinc-client depends on: ii adduser 3.113 ii ca-certificates 20111025 ii debconf [debconf-2.0] 1.5.41 ii libc6 2.13-21 ii libcurl3 7.21.7-3 ii libgcc1 1:4.6.2-3 ii libssl1.0.0 1.0.0e-2 ii libstdc++6 4.6.2-3 ii libx11-6 2:1.4.4-2 ii libxss1 1:1.2.1-2 ii python 2.7.2-9 ii zlib1g 1:1.2.3.4.dfsg-3 Versions of packages boinc-client recommends: ii ia32-libs 20111001 Versions of packages boinc-client suggests: ii boinc-app-seti <none> ii boinc-manager 6.13.10+dfsg-1 ii x11-xserver-utils 7.6+3 -- Configuration Files: /etc/boinc-client/global_prefs_override.xml changed: <global_preferences> <run_on_batteries>0</run_on_batteries> <run_if_user_active>1</run_if_user_active> <run_gpu_if_user_active>0</run_gpu_if_user_active> <idle_time_to_run>3.000000</idle_time_to_run> <suspend_cpu_usage>25.000000</suspend_cpu_usage> <start_hour>0.000000</start_hour> <end_hour>0.000000</end_hour> <net_start_hour>0.000000</net_start_hour> <net_end_hour>0.000000</net_end_hour> <leave_apps_in_memory>0</leave_apps_in_memory> <confirm_before_connecting>0</confirm_before_connecting> <hangup_if_dialed>0</hangup_if_dialed> <dont_verify_images>0</dont_verify_images> <work_buf_min_days>0.000000</work_buf_min_days> <work_buf_additional_days>0.250000</work_buf_additional_days> <max_ncpus_pct>50.000000</max_ncpus_pct> <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes> <disk_interval>60.000000</disk_interval> <disk_max_used_gb>100.000000</disk_max_used_gb> <disk_max_used_pct>50.000000</disk_max_used_pct> <disk_min_free_gb>0.000000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct> <max_bytes_sec_up>24995.840000</max_bytes_sec_up> <max_bytes_sec_down>249999.360000</max_bytes_sec_down> <cpu_usage_limit>100.000000</cpu_usage_limit> <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb> <daily_xfer_period_days>0</daily_xfer_period_days> </global_preferences> -- John Feuerstein <j...@feurix.com> -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org