Actually, i am not sure if this would help. I have been trying to reproduce the problem on a different machine, but I can't. Even without the patches !!!!!
Shehab Y. Elsayed, MSc. PhD Student The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering University of Toronto E-mail: shehaby...@gmail.com <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> On Wed, Mar 21, 2018 at 8:13 PM, Mathieu Desnoyers < mathieu.desnoy...@efficios.com> wrote: > ----- On Mar 21, 2018, at 8:01 PM, Shehab Elsayed <shehaby...@gmail.com> > wrote: > > Just to clarify more what I meant by custom events; I have new tracepoints > added in the source code of the benchmark. However, I don't enable the > corresponding events when I do the actual tracing. > > This is the sequence followed in building the benchmark: > gcc-7.2 -c -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 > -std=c11 -g -fno-strict-aliasing -DLTTNG_INST lu.c > gcc-7.2 -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 > -g -fno-strict-aliasing -DLTTNG_INST -o LU_NCB lu.o > ../../instrumentation/lttng_tp/tp.o -lm -llttng-ust -ldl > > LTTNG_INST is just a preprocessor flag I have and tp.o is my custom > tracepoints > > Could you share a repository with your custom instrumentation on github, > so I could > try it out ? > > My current problem is that I cannot reproduce your issue on my end. > > Thanks, > > Mathieu > > > > > Shehab Y. Elsayed, MSc. > PhD Student > The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering > University of Toronto > E-mail: shehaby...@gmail.com > <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> > > On Wed, Mar 21, 2018 at 7:55 PM, Shehab Elsayed <shehaby...@gmail.com> > wrote: > >> Still running into same problem. I attached the debug trace I got after >> applying the 2 patches. >> >> The benchmark I am running also includes some custom created tracepoints. >> I am not adding the events being traced in the files I have provided. Do >> you think this might be causing a problem when I have tracpoints from 2 >> different packages? >> >> Shehab Y. Elsayed, MSc. >> PhD Student >> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >> University of Toronto >> E-mail: shehaby...@gmail.com >> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >> >> On Wed, Mar 21, 2018 at 4:22 PM, Mathieu Desnoyers < >> mathieu.desnoy...@efficios.com> wrote: >> >>> ----- On Mar 21, 2018, at 1:55 PM, Shehab Elsayed <shehaby...@gmail.com> >>> wrote: >>> >>> I am so sorry for the late replies. >>> >>> I have tried the first patch you sent and the problem still happens >>> (although seemingly less frequently especially with debugging enabled!!!!). >>> I have attached the output I got from one of the erroneous runs. >>> >>> I will try the updated patch and let you know how it goes. >>> >>> Also, I am not sure if these points make any difference: >>> 1- The error actually happens at the end of the application. It actually >>> finishes execution, but then something goes wrong. >>> 2- I run into this problem only for some of the benchmarks (or at least >>> the problems happens much less frequently for others that I didn't run into >>> it, not yet) >>> >>> Thanks you very much, and sorry again for the late replies. >>> >>> >>> No worries! Looking through your log, I notice that pthread set cancel >>> state has problems when >>> called from application threads. We do not restore the original thread's >>> cancel state. Can you try >>> with the attached patch applied on top of the previous one ? >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> >>> >>> Shehab Y. Elsayed, MSc. >>> PhD Student >>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>> University of Toronto >>> E-mail: shehaby...@gmail.com >>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>> >>> On Wed, Mar 21, 2018 at 1:27 PM, Mathieu Desnoyers < >>> mathieu.desnoy...@efficios.com> wrote: >>> >>>> ----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers < >>>> mathieu.desnoy...@efficios.com> wrote: >>>> >>>> ----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers < >>>> mathieu.desnoy...@efficios.com> wrote: >>>> >>>> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers < >>>> mathieu.desnoy...@efficios.com> wrote: >>>> >>>> >>>> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed <shehaby...@gmail.com> >>>> wrote: >>>> >>>> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid" and made sure >>>> the value was actually written by "cat /proc/sys/kernel/perf_event_ >>>> paranoid" >>>> >>>> It executed normally 2 times but then got the same error. >>>> >>>> >>>> Can you provide the output when reproducing the issue with the >>>> LTTNG_UST_DEBUG=1 environment variable set when starting >>>> your test program ? >>>> >>>> I just noticed something that might explain what goes wrong here: >>>> >>>> lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock(). >>>> Pthread mutex >>>> in your case is instrumented with the pthread wrapper. This >>>> "add_thread_field" is invoked >>>> the first time the perf counter is hit by each given thread. When this >>>> happens, the >>>> instrumented pthread mutex will try to call into the instrumentation >>>> tracepoint again, >>>> which will call add_thread_field() (again), and so on until we reach >>>> the libringbuffer >>>> "lib_ring_buffer_nesting" threshold of 4 calls deep. >>>> >>>> I suspect this situation where we recursively call add_thread_field is >>>> unexpected, >>>> which may trigger your double-free here. >>>> >>>> Will investigate more. >>>> >>>> Can you try with the attached patch applied ? >>>> >>>> Here is an updated v2 of the patch which tests the notrace tls counter >>>> sooner (before evaluating >>>> filter). Please let me know how it goes. >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> >>>> Shehab Y. Elsayed, MSc. >>>> PhD Student >>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>> University of Toronto >>>> E-mail: shehaby...@gmail.com >>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>> >>>> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers < >>>> mathieu.desnoy...@efficios.com> wrote: >>>> >>>>> >>>>> >>>>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed < >>>>> shehaby...@gmail.com> wrote: >>>>> >>>>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1 >>>>> >>>>> I run the program as a normal user >>>>> >>>>> The glibc version I get by running "ldd --version" is 2.17 >>>>> >>>>> >>>>> Can you reproduce the issue after you do this as root ? >>>>> >>>>> echo "-1" > /proc/sys/kernel/perf_event_paranoid >>>>> >>>>> Based on this documentation of the Linux kernel: >>>>> >>>>> Documentation/sysctl/kernel.txt: >>>>> >>>>> perf_event_paranoid: >>>>> >>>>> Controls use of the performance events system by unprivileged >>>>> users (without CAP_SYS_ADMIN). The default value is 2. >>>>> >>>>> -1: Allow use of (almost) all events by all users >>>>> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >>>>> >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN >>>>> Disallow raw tracepoint access by users without CAP_SYS_ADMIN >>>>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN >>>>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> Shehab Y. Elsayed, MSc. >>>>> PhD Student >>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>> University of Toronto >>>>> E-mail: shehaby...@gmail.com >>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>> >>>>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers < >>>>> mathieu.desnoy...@efficios.com> wrote: >>>>> >>>>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers < >>>>>> mathieu.desnoy...@efficios.com> wrote: >>>>>> >>>>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed < >>>>>> shehaby...@gmail.com> wrote: >>>>>> >>>>>> Yes, I tried with only one of those contexts and I still ran into the >>>>>> same problem. >>>>>> >>>>>> What is the setting returned by >>>>>> >>>>>> cat /proc/sys/kernel/perf_event_paranoid >>>>>> >>>>>> on your system ? And do you run your test program as root or normal >>>>>> user ? >>>>>> >>>>>> Please CC the mailing list on your reply. >>>>>> >>>>>> >>>>>> I will also need to know what glibc version you have on your system. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mathieu >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mathieu >>>>>> >>>>>> >>>>>> >>>>>> Shehab Y. Elsayed, MSc. >>>>>> PhD Student >>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>>> University of Toronto >>>>>> E-mail: shehaby...@gmail.com >>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>>> >>>>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers < >>>>>> mathieu.desnoy...@efficios.com> wrote: >>>>>> >>>>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed < >>>>>>> shehaby...@gmail.com> wrote: >>>>>>> >>>>>>> Hi Mathieu, >>>>>>> >>>>>>> Thank you very much for your reply. >>>>>>> >>>>>>> I manually built lttng-ust from source (commit #: >>>>>>> 8a208943e21700211beee3ea64180a5a534c7d2a). >>>>>>> >>>>>>> This is how I set up the tracing session: >>>>>>> 1- lttng create lu_ncb_8_native -o {path} >>>>>>> 2- lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>> mutex_lock_req >>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>> mutex_lock_acq >>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>> mutex_lock_trylock >>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>> mutex_lock_unlock >>>>>>> 3- lttng add-context -u -t procname >>>>>>> lttng add-context -u -t vpid >>>>>>> lttng add-context -u -t pthread_id >>>>>>> lttng add-context -u -t vtid >>>>>>> lttng add-context -u -t ip >>>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>>> lttng add-context -u -t perf:thread:cycles >>>>>>> lttng add-context -u -t perf:thread:instructions >>>>>>> 4- lttng start >>>>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so >>>>>>> ./lu_ncb -p8 -n8096 -b32 >>>>>>> 6- lttng stop >>>>>>> 7- lttng destroy >>>>>>> >>>>>>> >>>>>>> Can you reproduce if you remove the following contexts ? >>>>>>> >>>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>>> lttng add-context -u -t perf:thread:cycles >>>>>>> lttng add-context -u -t perf:thread:instructions >>>>>>> >>>>>>> And if you only keep a single of those contexts ? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Mathieu >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Shehab Y. Elsayed, MSc. >>>>>>> PhD Student >>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer >>>>>>> Engineering >>>>>>> University of Toronto >>>>>>> E-mail: shehaby...@gmail.com >>>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>>>> >>>>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers < >>>>>>> mathieu.desnoy...@efficios.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed < >>>>>>>> shehaby...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hello All, >>>>>>>> >>>>>>>> I am trying to instrument a pthread application using the provided >>>>>>>> pthread wrapper, but I sometimes run into a "Double free or >>>>>>>> corruption error (fasttop)" error. >>>>>>>> >>>>>>>> >>>>>>>> Please provide more information about the version of lttng-ust you >>>>>>>> are using, and how you setup >>>>>>>> your tracing session. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Mathieu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Here is a description of what I have tried and noticed: >>>>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes >>>>>>>> works as expected. >>>>>>>> 2- From my experiments, the problem happens (more frequently at >>>>>>>> least) when adding performance counter contexts (I tried cycles, >>>>>>>> cpu_cycles and instructions). >>>>>>>> 3- I am testing using lu_ncb from splash3 benchmark suite after >>>>>>>> setting LD_PRELOAD to use the pthread wrapper as described in the >>>>>>>> LTTng documents. >>>>>>>> 4- Here is the backtrace printed after exiting: >>>>>>>> ======= Backtrace: ========= >>>>>>>> /lib64/libc.so.6([Thread 0x7ffff5611700 (LWP 97229) exited] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_destroy_context+ >>>>>>>> 0x35)[0x7ffff7471575] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_session_destroy+ >>>>>>>> 0x21c)[0x7ffff747363c] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x1e906)[0x7ffff746d906] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>>>> 0x9f)[0x7ffff746dccf] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>>>> 0x9f)[0x7ffff746dccf] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>>>> 0x9f)[0x7ffff746dccf] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_abi_exit+0x68)[ >>>>>>>> 0x7ffff746ead8] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x191d3)[0x7ffff74681d3] >>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_exit+0x67)[ >>>>>>>> 0x7ffff745ed57] >>>>>>>> /lib64/ld-linux-x86-64.so.2(+0xf85a)[0x7ffff7dec85a] >>>>>>>> /lib64/libc.so.6(+0x38a49)[0x7ffff6ca6a49] >>>>>>>> /lib64/libc.so.6(+0x38a95)[0x7ffff6ca6a95] >>>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401b51] >>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff6c8fb35] >>>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401c44] >>>>>>>> 5- Also, this is a backtrace I obtained from gdb: >>>>>>>> #0 0x00007ffff6eac1d7 in raise () from /lib64/libc.so.6 >>>>>>>> #1 0x00007ffff6ead8c8 in abort () from /lib64/libc.so.6 >>>>>>>> #2 0x00007ffff6eebf07 in __libc_message () from /lib64/libc.so.6 >>>>>>>> #3 0x00007ffff6ef3503 in _int_free () from /lib64/libc.so.6 >>>>>>>> #4 0x00007ffff768ad25 in lttng_destroy_perf_counter_field ( >>>>>>>> field=<optimized out>) at lttng-context-perf-counters.c:418 >>>>>>>> #5 0x00007ffff767a575 in lttng_destroy_context ( >>>>>>>> ctx=0x7ffff0011090) at lttng-context.c:278 >>>>>>>> #6 0x00007ffff767c63c in _lttng_channel_unmap ( >>>>>>>> lttng_chan=0x7ffff0010f40) at lttng-events.c:172 >>>>>>>> #7 lttng_session_destroy (session=0x7ffff0000900) >>>>>>>> at lttng-events.c:247 >>>>>>>> #8 0x00007ffff7676906 in lttng_release_session ( >>>>>>>> objd=<optimized out>) at lttng-ust-abi.c:601 >>>>>>>> #9 0x00007ffff7676ccf in lttng_ust_objd_unref (id=1, >>>>>>>> is_owner=<optimized out>) at lttng-ust-abi.c:216 >>>>>>>> #10 0x00007ffff7676ccf in lttng_ust_objd_unref (id=2, >>>>>>>> is_owner=<optimized out>) at lttng-ust-abi.c:216 >>>>>>>> #11 0x00007ffff7676ccf in lttng_ust_objd_unref (id=id@entry=18, >>>>>>>> is_owner=is_owner@entry=1) at lttng-ust-abi.c:216 >>>>>>>> #12 0x00007ffff7677ad8 in objd_table_destroy () >>>>>>>> at lttng-ust-abi.c:235 >>>>>>>> #13 lttng_ust_abi_exit () at lttng-ust-abi.c:1002 >>>>>>>> #14 0x00007ffff76711d3 in lttng_ust_cleanup (exiting=1) >>>>>>>> at lttng-ust-comm.c:1807 >>>>>>>> #15 0x00007ffff7667d57 in lttng_ust_exit () >>>>>>>> at lttng-ust-comm.c:1874 >>>>>>>> #16 0x00007ffff7dec85a in _dl_fini () >>>>>>>> from /lib64/ld-linux-x86-64.so.2 >>>>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/libc.so.6 >>>>>>>> #19 0x0000000000401b51 in main (argc=<optimized out>, >>>>>>>> argv=<optimized out>) at lu.c:368 >>>>>>>> >>>>>>>> Any ideas, why this happens and how to fix it? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Shehab >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> lttng-dev mailing list >>>>>>>> lttng-dev@lists.lttng.org >>>>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Mathieu Desnoyers >>>>>>>> EfficiOS Inc. >>>>>>>> http://www.efficios.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Mathieu Desnoyers >>>>>>> EfficiOS Inc. >>>>>>> http://www.efficios.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Mathieu Desnoyers >>>>>> EfficiOS Inc. >>>>>> http://www.efficios.com >>>>>> >>>>>> >>>>>> -- >>>>>> Mathieu Desnoyers >>>>>> EfficiOS Inc. >>>>>> http://www.efficios.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >>> >> >> > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com >
_______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev