----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers <mathieu.desnoy...@efficios.com> wrote:
> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers > <mathieu.desnoy...@efficios.com> wrote: >> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed <shehaby...@gmail.com> >> wrote: >>> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid " and made sure the >>> value was actually written by "cat /proc/sys/kernel/perf_event_paranoid" >>> It executed normally 2 times but then got the same error. >> Can you provide the output when reproducing the issue with the >> LTTNG_UST_DEBUG=1 environment variable set when starting >> your test program ? > I just noticed something that might explain what goes wrong here: > lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock(). > Pthread > mutex > in your case is instrumented with the pthread wrapper. This "add_thread_field" > is invoked > the first time the perf counter is hit by each given thread. When this > happens, > the > instrumented pthread mutex will try to call into the instrumentation > tracepoint > again, > which will call add_thread_field() (again), and so on until we reach the > libringbuffer > "lib_ring_buffer_nesting" threshold of 4 calls deep. > I suspect this situation where we recursively call add_thread_field is > unexpected, > which may trigger your double-free here. > Will investigate more. Can you try with the attached patch applied ? Thanks, Mathieu > Thanks, > Mathieu >> Thanks, >> Mathieu >>> Shehab Y. Elsayed, MSc. >>> PhD Student >>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>> University of Toronto >>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# | >>> shehaby...@gmail.com ] >>> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers < [ >>> mailto:mathieu.desnoy...@efficios.com | mathieu.desnoy...@efficios.com ] > >>> wrote: >>>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed < [ >>>> mailto:shehaby...@gmail.com | shehaby...@gmail.com ] > wrote: >>>>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1 >>>>> I run the program as a normal user >>>>> The glibc version I get by running "ldd --version" is 2.17 >>>> Can you reproduce the issue after you do this as root ? >>>> echo "-1" > /proc/sys/kernel/perf_event_paranoid >>>> Based on this documentation of the Linux kernel: >>>> Documentation/sysctl/kernel.txt: >>>> perf_event_paranoid: >>>> Controls use of the performance events system by unprivileged >>>> users (without CAP_SYS_ADMIN). The default value is 2. >>>> -1: Allow use of (almost) all events by all users >>>> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >>>> >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN >>>> Disallow raw tracepoint access by users without CAP_SYS_ADMIN >>>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN >>>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN >>>> Thanks, >>>> Mathieu >>>>> Shehab Y. Elsayed, MSc. >>>>> PhD Student >>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>> University of Toronto >>>>> E-mail: [ >>>>> https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# | >>>>> shehaby...@gmail.com ] >>>>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers < [ >>>>> mailto:mathieu.desnoy...@efficios.com | mathieu.desnoy...@efficios.com ] > >>>>> wrote: >>>>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers < [ >>>>>> mailto:mathieu.desnoy...@efficios.com | mathieu.desnoy...@efficios.com ] >>>>>> > >>>>>> wrote: >>>>>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed < [ >>>>>>> mailto:shehaby...@gmail.com | shehaby...@gmail.com ] > wrote: >>>>>>>> Yes, I tried with only one of those contexts and I still ran into the >>>>>>>> same >>>>>>>> problem. >>>>>>> What is the setting returned by >>>>>>> cat /proc/sys/kernel/perf_event_paranoid >>>>>>> on your system ? And do you run your test program as root or normal >>>>>>> user ? >>>>>>> Please CC the mailing list on your reply. >>>>>> I will also need to know what glibc version you have on your system. >>>>>> Thanks, >>>>>> Mathieu >>>>>>> Thanks, >>>>>>> Mathieu >>>>>>>> Shehab Y. Elsayed, MSc. >>>>>>>> PhD Student >>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>>>>> University of Toronto >>>>>>>> E-mail: [ >>>>>>>> https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# | >>>>>>>> shehaby...@gmail.com ] >>>>>>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers < [ >>>>>>>> mailto:mathieu.desnoy...@efficios.com | mathieu.desnoy...@efficios.com >>>>>>>> ] > >>>>>>>> wrote: >>>>>>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed < [ >>>>>>>>> mailto:shehaby...@gmail.com | shehaby...@gmail.com ] > wrote: >>>>>>>>>> Hi Mathieu, >>>>>>>>>> Thank you very much for your reply. >>>>>>>>>> I manually built lttng-ust from source (commit #: >>>>>>>>>> 8a208943e21700211beee3ea64180a5a534c7d2a). >>>>>>>>>> This is how I set up the tracing session: >>>>>>>>>> 1- lttng create lu_ncb_8_native -o {path} >>>>>>>>>> 2- lttng enable-event --userspace >>>>>>>>>> lttng_ust_pthread:pthread_mutex_lock_req >>>>>>>>>> lttng enable-event --userspace >>>>>>>>>> lttng_ust_pthread:pthread_mutex_lock_acq >>>>>>>>>> lttng enable-event --userspace >>>>>>>>>> lttng_ust_pthread:pthread_mutex_lock_trylock >>>>>>>>>> lttng enable-event --userspace >>>>>>>>>> lttng_ust_pthread:pthread_mutex_lock_unlock >>>>>>>>>> 3- lttng add-context -u -t procname >>>>>>>>>> lttng add-context -u -t vpid >>>>>>>>>> lttng add-context -u -t pthread_id >>>>>>>>>> lttng add-context -u -t vtid >>>>>>>>>> lttng add-context -u -t ip >>>>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>>>>>> lttng add-context -u -t perf:thread:cycles >>>>>>>>>> lttng add-context -u -t perf:thread:instructions >>>>>>>>>> 4- lttng start >>>>>>>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so >>>>>>>>>> ./lu_ncb -p8 -n8096 >>>>>>>>>> -b32 >>>>>>>>>> 6- lttng stop >>>>>>>>>> 7- lttng destroy >>>>>>>>> Can you reproduce if you remove the following contexts ? >>>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>>>>> lttng add-context -u -t perf:thread:cycles >>>>>>>>> lttng add-context -u -t perf:thread:instructions >>>>>>>>> And if you only keep a single of those contexts ? >>>>>>>>> Thanks, >>>>>>>>> Mathieu >>>>>>>>>> Shehab Y. Elsayed, MSc. >>>>>>>>>> PhD Student >>>>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer >>>>>>>>>> Engineering >>>>>>>>>> University of Toronto >>>>>>>>>> E-mail: [ >>>>>>>>>> https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# | >>>>>>>>>> shehaby...@gmail.com ] >>>>>>>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers < [ >>>>>>>>>> mailto:mathieu.desnoy...@efficios.com | >>>>>>>>>> mathieu.desnoy...@efficios.com ] > >>>>>>>>>> wrote: >>>>>>>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed < [ >>>>>>>>>>> mailto:shehaby...@gmail.com | shehaby...@gmail.com ] > wrote: >>>>>>>>>>>> Hello All, >>>>>>>>>>>> I am trying to instrument a pthread application using the provided >>>>>>>>>>>> pthread >>>>>>>>>>>> wrapper, but I sometimes run into a "Double free or corruption >>>>>>>>>>>> error ( fasttop >>>>>>>>>>>> )" error. >>>>>>>>>>> Please provide more information about the version of lttng-ust you >>>>>>>>>>> are using, >>>>>>>>>>> and how you setup >>>>>>>>>>> your tracing session. >>>>>>>>>>> Thanks, >>>>>>>>>>> Mathieu >>>>>>>>>>>> Here is a description of what I have tried and noticed: >>>>>>>>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes >>>>>>>>>>>> works as >>>>>>>>>>>> expected. >>>>>>>>>>>> 2- From my experiments, the problem happens (more frequently at >>>>>>>>>>>> least) when >>>>>>>>>>>> adding performance counter contexts (I tried cycles, cpu _cycles >>>>>>>>>>>> and >>>>>>>>>>>> instructions). >>>>>>>>>>>> 3- I am testing using lu _ ncb from splash3 benchmark suite after >>>>>>>>>>>> setting LD _ >>>>>>>>>>>> PRELOAD to use the pthread wrapper as described in the LTTng >>>>>>>>>>>> documents. >>>>>>>>>>>> 4- Here is the backtrace printed after exiting: >>>>>>>>>>>> ======= Backtrace : ========= >>>>>>>>>>>> /lib64/ libc .so.6([Thread 0x7ffff5611700 ( LWP 97229) exited] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng >>>>>>>>>>>> _destroy_context+0x35)[0x7ffff7471575] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng >>>>>>>>>>>> _session_destroy+0x21c)[0x7ffff747363c] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0(+0x1e906)[0x7ffff746d906] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref >>>>>>>>>>>> +0x9f)[0x7ffff746dccf] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref >>>>>>>>>>>> +0x9f)[0x7ffff746dccf] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref >>>>>>>>>>>> +0x9f)[0x7ffff746dccf] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ abi >>>>>>>>>>>> _exit+0x68)[0x7ffff746ead8] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0(+0x191d3)[0x7ffff74681d3] >>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust >>>>>>>>>>>> _exit+0x67)[0x7ffff745ed57] >>>>>>>>>>>> /lib64/ ld - linux -x86-64.so.2(+0xf85a)[0x7ffff7dec85a] >>>>>>>>>>>> /lib64/ libc .so.6(+0x38a49)[0x7ffff6ca6a49] >>>>>>>>>>>> /lib64/ libc .so.6(+0x38a95)[0x7ffff6ca6a95] >>>>>>>>>>>> / aenao -99/elsayed9/ LTTng /data/scripts/ tmp / lu _ ncb >>>>>>>>>>>> [0x401b51] >>>>>>>>>>>> /lib64/ libc .so.6(__ libc _start_main+0xf5)[0x7ffff6c8fb35] >>>>>>>>>>>> / aenao -99/elsayed9/ LTTng /data/scripts/ tmp / lu _ ncb >>>>>>>>>>>> [0x401c44] >>>>>>>>>>>> 5- Also, this is a backtrace I obtained from gdb : >>>>>>>>>>>> #0 0x00007ffff6eac1d7 in raise () from /lib64/ libc .so.6 >>>>>>>>>>>> #1 0x00007ffff6ead8c8 in abort () from /lib64/ libc .so.6 >>>>>>>>>>>> #2 0x00007ffff6eebf07 in __ libc _message () from /lib64/ libc >>>>>>>>>>>> .so.6 >>>>>>>>>>>> #3 0x00007ffff6ef3503 in _int_free () from /lib64/ libc .so.6 >>>>>>>>>>>> #4 0x00007ffff768ad25 in lttng _destroy_ perf _counter_field ( >>>>>>>>>>>> field=<optimized out>) at lttng -context- perf -counters.c:418 >>>>>>>>>>>> #5 0x00007ffff767a575 in lttng _destroy_context ( >>>>>>>>>>>> ctx =0x7ffff0011090) at lttng -context.c:278 >>>>>>>>>>>> #6 0x00007ffff767c63c in _ lttng _channel_ unmap ( >>>>>>>>>>>> lttng _ chan =0x7ffff0010f40) at lttng -events.c:172 >>>>>>>>>>>> #7 lttng _session_destroy (session=0x7ffff0000900) >>>>>>>>>>>> at lttng -events.c:247 >>>>>>>>>>>> #8 0x00007ffff7676906 in lttng _release_session ( >>>>>>>>>>>> objd =<optimized out>) at lttng - ust - abi .c:601 >>>>>>>>>>>> #9 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=1, >>>>>>>>>>>> is_owner=<optimized out>) at lttng - ust - abi .c:216 >>>>>>>>>>>> #10 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=2, >>>>>>>>>>>> is_owner=<optimized out>) at lttng - ust - abi .c:216 >>>>>>>>>>>> #11 0x00007ffff7676ccf in lttng _ ust _ objd _ unref >>>>>>>>>>>> (id=id@entry=18, >>>>>>>>>>>> is_owner=is_owner@entry=1) at lttng - ust - abi .c:216 >>>>>>>>>>>> #12 0x00007ffff7677ad8 in objd _table_destroy () >>>>>>>>>>>> at lttng - ust - abi .c:235 >>>>>>>>>>>> #13 lttng _ ust _ abi _exit () at lttng - ust - abi .c:1002 >>>>>>>>>>>> #14 0x00007ffff76711d3 in lttng _ ust _cleanup (exiting=1) >>>>>>>>>>>> at lttng - ust -comm.c:1807 >>>>>>>>>>>> #15 0x00007ffff7667d57 in lttng _ ust _exit () >>>>>>>>>>>> at lttng - ust -comm.c:1874 >>>>>>>>>>>> #16 0x00007ffff7dec85a in _ dl _ fini () >>>>>>>>>>>> from /lib64/ ld - linux -x86-64.so.2 >>>>>>>>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers () >>>>>>>>>>>> from /lib64/ libc .so.6 >>>>>>>>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/ libc .so.6 >>>>>>>>>>>> #19 0x0000000000401b51 in main ( argc =<optimized out>, >>>>>>>>>>>> argv =<optimized out>) at lu .c:368 >>>>>>>>>>>> Any ideas, why this happens and how to fix it? >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Shehab >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> lttng-dev mailing list >>>>>>>>>>>> [ mailto:lttng-dev@lists.lttng.org | lttng-dev@lists.lttng.org ] >>>>>>>>>>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev | >>>>>>>>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ] >>>>>>>>>>> -- >>>>>>>>>>> Mathieu Desnoyers >>>>>>>>>>> EfficiOS Inc. >>>>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ] >>>>>>>>> -- >>>>>>>>> Mathieu Desnoyers >>>>>>>>> EfficiOS Inc. >>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ] >>>>>>> -- >>>>>>> Mathieu Desnoyers >>>>>>> EfficiOS Inc. >>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ] >>>>>> -- >>>>>> Mathieu Desnoyers >>>>>> EfficiOS Inc. >>>>>> [ http://www.efficios.com/ | http://www.efficios.com ] >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> [ http://www.efficios.com/ | http://www.efficios.com ] >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
From 978e682321963f5b142dcb45e3d1dfe6b263c6ea Mon Sep 17 00:00:00 2001 From: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> Date: Tue, 20 Mar 2018 17:32:36 -0400 Subject: [RFC PATCH] Fix: perf event mutex with pthread wrapper We do not want to recurse in the pthread mutex instrumentation when setting up the perf counters for a given thread. Introduce a "notrace" per-thread counter to inhibit tracing for the current thread. Signed-off-by: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> --- include/lttng/ust-events.h | 3 +++ liblttng-ust/lttng-context-perf-counters.c | 8 ++++++-- liblttng-ust/lttng-ring-buffer-client.h | 3 +++ liblttng-ust/lttng-tracer-core.h | 3 +++ liblttng-ust/lttng-ust-comm.c | 23 +++++++++++++++++++++++ 5 files changed, 38 insertions(+), 2 deletions(-) diff --git a/include/lttng/ust-events.h b/include/lttng/ust-events.h index 86733503..b33a804b 100644 --- a/include/lttng/ust-events.h +++ b/include/lttng/ust-events.h @@ -738,6 +738,9 @@ struct lttng_enum *lttng_ust_enum_get_from_desc(struct lttng_session *session, void lttng_ust_dl_update(void *ip); void lttng_ust_fixup_fd_tracker_tls(void); +void lttng_ust_begin_notrace(void); +void lttng_ust_end_notrace(void); + /* For backward compatibility. Leave those exported symbols in place. */ extern struct lttng_ctx *lttng_static_ctx; void lttng_context_init(void); diff --git a/liblttng-ust/lttng-context-perf-counters.c b/liblttng-ust/lttng-context-perf-counters.c index a15417cc..0ce89245 100644 --- a/liblttng-ust/lttng-context-perf-counters.c +++ b/liblttng-ust/lttng-context-perf-counters.c @@ -328,16 +328,20 @@ struct lttng_perf_counter_thread_field * struct lttng_perf_counter_thread *perf_thread; struct lttng_perf_counter_thread_field *thread_field; + lttng_ust_begin_notrace(); perf_thread = pthread_getspecific(perf_counter_key); if (!perf_thread) perf_thread = alloc_perf_counter_thread(); cds_list_for_each_entry_rcu(thread_field, &perf_thread->rcu_field_list, rcu_field_node) { if (thread_field->field == field) - return thread_field; + goto end; } /* perf_counter_thread_field not found, need to add one */ - return add_thread_field(field, perf_thread); + thread_field = add_thread_field(field, perf_thread); +end: + lttng_ust_end_notrace(); + return thread_field; } static diff --git a/liblttng-ust/lttng-ring-buffer-client.h b/liblttng-ust/lttng-ring-buffer-client.h index 0fae8878..0960d4cb 100644 --- a/liblttng-ust/lttng-ring-buffer-client.h +++ b/liblttng-ust/lttng-ring-buffer-client.h @@ -724,6 +724,9 @@ int lttng_event_reserve(struct lttng_ust_lib_ring_buffer_ctx *ctx, struct lttng_client_ctx client_ctx; int ret, cpu; + if (caa_unlikely(URCU_TLS(lttng_ust_notrace_thread)) > 0) + return -EPERM; + /* Compute internal size of context structures. */ if (lttng_ctx) { diff --git a/liblttng-ust/lttng-tracer-core.h b/liblttng-ust/lttng-tracer-core.h index ba232f32..98d79be8 100644 --- a/liblttng-ust/lttng-tracer-core.h +++ b/liblttng-ust/lttng-tracer-core.h @@ -25,6 +25,7 @@ #include <stddef.h> #include <urcu/arch.h> #include <urcu/list.h> +#include <urcu/tls-compat.h> #include <lttng/ust-tracer.h> #include <lttng/bug.h> #include <lttng/ringbuffer-config.h> @@ -37,6 +38,8 @@ struct lttng_ctx_field; struct lttng_ust_lib_ring_buffer_ctx; struct lttng_ctx_value; +extern DECLARE_URCU_TLS(int, lttng_ust_notrace_thread); + int ust_lock(void) __attribute__ ((warn_unused_result)); void ust_lock_nocheck(void); void ust_unlock(void); diff --git a/liblttng-ust/lttng-ust-comm.c b/liblttng-ust/lttng-ust-comm.c index d4add1c0..9520f246 100644 --- a/liblttng-ust/lttng-ust-comm.c +++ b/liblttng-ust/lttng-ust-comm.c @@ -92,6 +92,9 @@ static pthread_mutex_t ust_mutex = PTHREAD_MUTEX_INITIALIZER; /* Allow nesting the ust_mutex within the same thread. */ static DEFINE_URCU_TLS(int, ust_mutex_nest); +/* Do not trace events for the current thread. */ +DEFINE_URCU_TLS(int, lttng_ust_notrace_thread) __attribute__((visibility("hidden"))); + /* * ust_exit_mutex protects thread_active variable wrt thread exit. It * cannot be done by ust_mutex because pthread_cancel(), which takes an @@ -121,6 +124,19 @@ static int lttng_ust_comm_should_quit; int lttng_ust_loaded __attribute__((weak)); /* + * Inhibit lttng-ust tracing for this thread. + */ +void lttng_ust_begin_notrace(void) +{ + URCU_TLS(lttng_ust_notrace_thread)++; +} + +void lttng_ust_end_notrace(void) +{ + --URCU_TLS(lttng_ust_notrace_thread); +} + +/* * Return 0 on success, -1 if should quit. * The lock is taken in both cases. * Signal-safe. @@ -392,6 +408,12 @@ void lttng_fixup_ust_mutex_nest_tls(void) asm volatile ("" : : "m" (URCU_TLS(ust_mutex_nest))); } +static +void lttng_fixup_lttng_ust_notrace_tls(void) +{ + asm volatile ("" : : "m" (URCU_TLS(lttng_ust_notrace_thread))); +} + /* * Fixup urcu bp TLS. */ @@ -410,6 +432,7 @@ void lttng_ust_fixup_tls(void) lttng_fixup_nest_count_tls(); lttng_fixup_procname_tls(); lttng_fixup_ust_mutex_nest_tls(); + lttng_fixup_lttng_ust_notrace_tls(); lttng_ust_fixup_fd_tracker_tls(); } -- 2.11.0
_______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev