Still running into same problem. I attached the debug trace I got after applying the 2 patches.
The benchmark I am running also includes some custom created tracepoints. I am not adding the events being traced in the files I have provided. Do you think this might be causing a problem when I have tracpoints from 2 different packages? Shehab Y. Elsayed, MSc. PhD Student The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering University of Toronto E-mail: [email protected] <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> On Wed, Mar 21, 2018 at 4:22 PM, Mathieu Desnoyers < [email protected]> wrote: > ----- On Mar 21, 2018, at 1:55 PM, Shehab Elsayed <[email protected]> > wrote: > > I am so sorry for the late replies. > > I have tried the first patch you sent and the problem still happens > (although seemingly less frequently especially with debugging enabled!!!!). > I have attached the output I got from one of the erroneous runs. > > I will try the updated patch and let you know how it goes. > > Also, I am not sure if these points make any difference: > 1- The error actually happens at the end of the application. It actually > finishes execution, but then something goes wrong. > 2- I run into this problem only for some of the benchmarks (or at least > the problems happens much less frequently for others that I didn't run into > it, not yet) > > Thanks you very much, and sorry again for the late replies. > > > No worries! Looking through your log, I notice that pthread set cancel > state has problems when > called from application threads. We do not restore the original thread's > cancel state. Can you try > with the attached patch applied on top of the previous one ? > > Thanks, > > Mathieu > > > > > Shehab Y. Elsayed, MSc. > PhD Student > The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering > University of Toronto > E-mail: [email protected] > <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> > > On Wed, Mar 21, 2018 at 1:27 PM, Mathieu Desnoyers < > [email protected]> wrote: > >> ----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers < >> [email protected]> wrote: >> >> ----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers < >> [email protected]> wrote: >> >> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers < >> [email protected]> wrote: >> >> >> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed <[email protected]> >> wrote: >> >> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid" and made sure >> the value was actually written by "cat /proc/sys/kernel/perf_event_ >> paranoid" >> >> It executed normally 2 times but then got the same error. >> >> >> Can you provide the output when reproducing the issue with the >> LTTNG_UST_DEBUG=1 environment variable set when starting >> your test program ? >> >> I just noticed something that might explain what goes wrong here: >> >> lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock(). >> Pthread mutex >> in your case is instrumented with the pthread wrapper. This >> "add_thread_field" is invoked >> the first time the perf counter is hit by each given thread. When this >> happens, the >> instrumented pthread mutex will try to call into the instrumentation >> tracepoint again, >> which will call add_thread_field() (again), and so on until we reach the >> libringbuffer >> "lib_ring_buffer_nesting" threshold of 4 calls deep. >> >> I suspect this situation where we recursively call add_thread_field is >> unexpected, >> which may trigger your double-free here. >> >> Will investigate more. >> >> Can you try with the attached patch applied ? >> >> Here is an updated v2 of the patch which tests the notrace tls counter >> sooner (before evaluating >> filter). Please let me know how it goes. >> >> Thanks, >> >> Mathieu >> >> >> >> >> >> Thanks, >> >> Mathieu >> >> >> >> >> >> Thanks, >> >> Mathieu >> >> >> >> >> Thanks, >> >> Mathieu >> >> >> >> >> Shehab Y. Elsayed, MSc. >> PhD Student >> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >> University of Toronto >> E-mail: [email protected] >> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >> >> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers < >> [email protected]> wrote: >> >>> >>> >>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed <[email protected]> >>> wrote: >>> >>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1 >>> >>> I run the program as a normal user >>> >>> The glibc version I get by running "ldd --version" is 2.17 >>> >>> >>> Can you reproduce the issue after you do this as root ? >>> >>> echo "-1" > /proc/sys/kernel/perf_event_paranoid >>> >>> Based on this documentation of the Linux kernel: >>> >>> Documentation/sysctl/kernel.txt: >>> >>> perf_event_paranoid: >>> >>> Controls use of the performance events system by unprivileged >>> users (without CAP_SYS_ADMIN). The default value is 2. >>> >>> -1: Allow use of (almost) all events by all users >>> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >>> >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN >>> Disallow raw tracepoint access by users without CAP_SYS_ADMIN >>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN >>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> >>> Shehab Y. Elsayed, MSc. >>> PhD Student >>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>> University of Toronto >>> E-mail: [email protected] >>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>> >>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers < >>> [email protected]> wrote: >>> >>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers < >>>> [email protected]> wrote: >>>> >>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed <[email protected]> >>>> wrote: >>>> >>>> Yes, I tried with only one of those contexts and I still ran into the >>>> same problem. >>>> >>>> What is the setting returned by >>>> >>>> cat /proc/sys/kernel/perf_event_paranoid >>>> >>>> on your system ? And do you run your test program as root or normal >>>> user ? >>>> >>>> Please CC the mailing list on your reply. >>>> >>>> >>>> I will also need to know what glibc version you have on your system. >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> Shehab Y. Elsayed, MSc. >>>> PhD Student >>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>> University of Toronto >>>> E-mail: [email protected] >>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>> >>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers < >>>> [email protected]> wrote: >>>> >>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed < >>>>> [email protected]> wrote: >>>>> >>>>> Hi Mathieu, >>>>> >>>>> Thank you very much for your reply. >>>>> >>>>> I manually built lttng-ust from source (commit #: >>>>> 8a208943e21700211beee3ea64180a5a534c7d2a). >>>>> >>>>> This is how I set up the tracing session: >>>>> 1- lttng create lu_ncb_8_native -o {path} >>>>> 2- lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>> mutex_lock_req >>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>> mutex_lock_acq >>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>> mutex_lock_trylock >>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>> mutex_lock_unlock >>>>> 3- lttng add-context -u -t procname >>>>> lttng add-context -u -t vpid >>>>> lttng add-context -u -t pthread_id >>>>> lttng add-context -u -t vtid >>>>> lttng add-context -u -t ip >>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>> lttng add-context -u -t perf:thread:cycles >>>>> lttng add-context -u -t perf:thread:instructions >>>>> 4- lttng start >>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so ./lu_ncb >>>>> -p8 -n8096 -b32 >>>>> 6- lttng stop >>>>> 7- lttng destroy >>>>> >>>>> >>>>> Can you reproduce if you remove the following contexts ? >>>>> >>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>> lttng add-context -u -t perf:thread:cycles >>>>> lttng add-context -u -t perf:thread:instructions >>>>> >>>>> And if you only keep a single of those contexts ? >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Shehab Y. Elsayed, MSc. >>>>> PhD Student >>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>> University of Toronto >>>>> E-mail: [email protected] >>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>> >>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Hello All, >>>>>> >>>>>> I am trying to instrument a pthread application using the provided >>>>>> pthread wrapper, but I sometimes run into a "Double free or >>>>>> corruption error (fasttop)" error. >>>>>> >>>>>> >>>>>> Please provide more information about the version of lttng-ust you >>>>>> are using, and how you setup >>>>>> your tracing session. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mathieu >>>>>> >>>>>> >>>>>> >>>>>> Here is a description of what I have tried and noticed: >>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes >>>>>> works as expected. >>>>>> 2- From my experiments, the problem happens (more frequently at >>>>>> least) when adding performance counter contexts (I tried cycles, >>>>>> cpu_cycles >>>>>> and instructions). >>>>>> 3- I am testing using lu_ncb from splash3 benchmark suite after >>>>>> setting LD_PRELOAD to use the pthread wrapper as described in the >>>>>> LTTng documents. >>>>>> 4- Here is the backtrace printed after exiting: >>>>>> ======= Backtrace: ========= >>>>>> /lib64/libc.so.6([Thread 0x7ffff5611700 (LWP 97229) exited] >>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_destroy_context+ >>>>>> 0x35)[0x7ffff7471575] >>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_session_destroy+ >>>>>> 0x21c)[0x7ffff747363c] >>>>>> /usr/local/lib/liblttng-ust.so.0(+0x1e906)[0x7ffff746d906] >>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>> 0x9f)[0x7ffff746dccf] >>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>> 0x9f)[0x7ffff746dccf] >>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>> 0x9f)[0x7ffff746dccf] >>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_abi_exit+0x68)[ >>>>>> 0x7ffff746ead8] >>>>>> /usr/local/lib/liblttng-ust.so.0(+0x191d3)[0x7ffff74681d3] >>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_exit+0x67)[0x7ffff745ed57] >>>>>> /lib64/ld-linux-x86-64.so.2(+0xf85a)[0x7ffff7dec85a] >>>>>> /lib64/libc.so.6(+0x38a49)[0x7ffff6ca6a49] >>>>>> /lib64/libc.so.6(+0x38a95)[0x7ffff6ca6a95] >>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401b51] >>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff6c8fb35] >>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401c44] >>>>>> 5- Also, this is a backtrace I obtained from gdb: >>>>>> #0 0x00007ffff6eac1d7 in raise () from /lib64/libc.so.6 >>>>>> #1 0x00007ffff6ead8c8 in abort () from /lib64/libc.so.6 >>>>>> #2 0x00007ffff6eebf07 in __libc_message () from /lib64/libc.so.6 >>>>>> #3 0x00007ffff6ef3503 in _int_free () from /lib64/libc.so.6 >>>>>> #4 0x00007ffff768ad25 in lttng_destroy_perf_counter_field ( >>>>>> field=<optimized out>) at lttng-context-perf-counters.c:418 >>>>>> #5 0x00007ffff767a575 in lttng_destroy_context ( >>>>>> ctx=0x7ffff0011090) at lttng-context.c:278 >>>>>> #6 0x00007ffff767c63c in _lttng_channel_unmap ( >>>>>> lttng_chan=0x7ffff0010f40) at lttng-events.c:172 >>>>>> #7 lttng_session_destroy (session=0x7ffff0000900) >>>>>> at lttng-events.c:247 >>>>>> #8 0x00007ffff7676906 in lttng_release_session ( >>>>>> objd=<optimized out>) at lttng-ust-abi.c:601 >>>>>> #9 0x00007ffff7676ccf in lttng_ust_objd_unref (id=1, >>>>>> is_owner=<optimized out>) at lttng-ust-abi.c:216 >>>>>> #10 0x00007ffff7676ccf in lttng_ust_objd_unref (id=2, >>>>>> is_owner=<optimized out>) at lttng-ust-abi.c:216 >>>>>> #11 0x00007ffff7676ccf in lttng_ust_objd_unref (id=id@entry=18, >>>>>> is_owner=is_owner@entry=1) at lttng-ust-abi.c:216 >>>>>> #12 0x00007ffff7677ad8 in objd_table_destroy () >>>>>> at lttng-ust-abi.c:235 >>>>>> #13 lttng_ust_abi_exit () at lttng-ust-abi.c:1002 >>>>>> #14 0x00007ffff76711d3 in lttng_ust_cleanup (exiting=1) >>>>>> at lttng-ust-comm.c:1807 >>>>>> #15 0x00007ffff7667d57 in lttng_ust_exit () >>>>>> at lttng-ust-comm.c:1874 >>>>>> #16 0x00007ffff7dec85a in _dl_fini () >>>>>> from /lib64/ld-linux-x86-64.so.2 >>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers () >>>>>> from /lib64/libc.so.6 >>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/libc.so.6 >>>>>> #19 0x0000000000401b51 in main (argc=<optimized out>, >>>>>> argv=<optimized out>) at lu.c:368 >>>>>> >>>>>> Any ideas, why this happens and how to fix it? >>>>>> >>>>>> Thanks, >>>>>> Shehab >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> lttng-dev mailing list >>>>>> [email protected] >>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >>>>>> >>>>>> >>>>>> -- >>>>>> Mathieu Desnoyers >>>>>> EfficiOS Inc. >>>>>> http://www.efficios.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >>> >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com >> > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com >
2patches_out.dbg
Description: Binary data
_______________________________________________ lttng-dev mailing list [email protected] https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
