It is v2.17 (for the problematic system) and 2.23 for the other Shehab Y. Elsayed, MSc. PhD Student The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering University of Toronto E-mail: shehaby...@gmail.com <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
On Thu, Mar 22, 2018 at 1:00 PM, Mathieu Desnoyers < mathieu.desnoy...@efficios.com> wrote: > ----- On Mar 22, 2018, at 12:24 PM, Shehab Elsayed <shehaby...@gmail.com> > wrote: > > Actually, i am not sure if this would help. I have been trying to > reproduce the problem on a different machine, but I can't. Even without the > patches !!!!! > > > Does it have the same glibc version ? > > Thanks, > > Mathieu > > > > > Shehab Y. Elsayed, MSc. > PhD Student > The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering > University of Toronto > E-mail: shehaby...@gmail.com > <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> > > On Wed, Mar 21, 2018 at 8:13 PM, Mathieu Desnoyers < > mathieu.desnoy...@efficios.com> wrote: > >> ----- On Mar 21, 2018, at 8:01 PM, Shehab Elsayed <shehaby...@gmail.com> >> wrote: >> >> Just to clarify more what I meant by custom events; I have new >> tracepoints added in the source code of the benchmark. However, I don't >> enable the corresponding events when I do the actual tracing. >> >> This is the sequence followed in building the benchmark: >> gcc-7.2 -c -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 >> -std=c11 -g -fno-strict-aliasing -DLTTNG_INST lu.c >> gcc-7.2 -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 >> -std=c11 -g -fno-strict-aliasing -DLTTNG_INST -o LU_NCB lu.o >> ../../instrumentation/lttng_tp/tp.o -lm -llttng-ust -ldl >> >> LTTNG_INST is just a preprocessor flag I have and tp.o is my custom >> tracepoints >> >> Could you share a repository with your custom instrumentation on github, >> so I could >> try it out ? >> >> My current problem is that I cannot reproduce your issue on my end. >> >> Thanks, >> >> Mathieu >> >> >> >> >> Shehab Y. Elsayed, MSc. >> PhD Student >> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >> University of Toronto >> E-mail: shehaby...@gmail.com >> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >> >> On Wed, Mar 21, 2018 at 7:55 PM, Shehab Elsayed <shehaby...@gmail.com> >> wrote: >> >>> Still running into same problem. I attached the debug trace I got after >>> applying the 2 patches. >>> >>> The benchmark I am running also includes some custom created >>> tracepoints. I am not adding the events being traced in the files I have >>> provided. Do you think this might be causing a problem when I have >>> tracpoints from 2 different packages? >>> >>> Shehab Y. Elsayed, MSc. >>> PhD Student >>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>> University of Toronto >>> E-mail: shehaby...@gmail.com >>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>> >>> On Wed, Mar 21, 2018 at 4:22 PM, Mathieu Desnoyers < >>> mathieu.desnoy...@efficios.com> wrote: >>> >>>> ----- On Mar 21, 2018, at 1:55 PM, Shehab Elsayed <shehaby...@gmail.com> >>>> wrote: >>>> >>>> I am so sorry for the late replies. >>>> >>>> I have tried the first patch you sent and the problem still happens >>>> (although seemingly less frequently especially with debugging enabled!!!!). >>>> I have attached the output I got from one of the erroneous runs. >>>> >>>> I will try the updated patch and let you know how it goes. >>>> >>>> Also, I am not sure if these points make any difference: >>>> 1- The error actually happens at the end of the application. It >>>> actually finishes execution, but then something goes wrong. >>>> 2- I run into this problem only for some of the benchmarks (or at least >>>> the problems happens much less frequently for others that I didn't run into >>>> it, not yet) >>>> >>>> Thanks you very much, and sorry again for the late replies. >>>> >>>> >>>> No worries! Looking through your log, I notice that pthread set cancel >>>> state has problems when >>>> called from application threads. We do not restore the original >>>> thread's cancel state. Can you try >>>> with the attached patch applied on top of the previous one ? >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>> >>>> >>>> Shehab Y. Elsayed, MSc. >>>> PhD Student >>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>> University of Toronto >>>> E-mail: shehaby...@gmail.com >>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>> >>>> On Wed, Mar 21, 2018 at 1:27 PM, Mathieu Desnoyers < >>>> mathieu.desnoy...@efficios.com> wrote: >>>> >>>>> ----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers < >>>>> mathieu.desnoy...@efficios.com> wrote: >>>>> >>>>> ----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers < >>>>> mathieu.desnoy...@efficios.com> wrote: >>>>> >>>>> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers < >>>>> mathieu.desnoy...@efficios.com> wrote: >>>>> >>>>> >>>>> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed < >>>>> shehaby...@gmail.com> wrote: >>>>> >>>>> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid" and made >>>>> sure the value was actually written by "cat /proc/sys/kernel/perf_event_ >>>>> paranoid" >>>>> >>>>> It executed normally 2 times but then got the same error. >>>>> >>>>> >>>>> Can you provide the output when reproducing the issue with the >>>>> LTTNG_UST_DEBUG=1 environment variable set when starting >>>>> your test program ? >>>>> >>>>> I just noticed something that might explain what goes wrong here: >>>>> >>>>> lttng-context-perf-counters.c: add_thread_field() grabs the >>>>> ust_lock(). Pthread mutex >>>>> in your case is instrumented with the pthread wrapper. This >>>>> "add_thread_field" is invoked >>>>> the first time the perf counter is hit by each given thread. When this >>>>> happens, the >>>>> instrumented pthread mutex will try to call into the instrumentation >>>>> tracepoint again, >>>>> which will call add_thread_field() (again), and so on until we reach >>>>> the libringbuffer >>>>> "lib_ring_buffer_nesting" threshold of 4 calls deep. >>>>> >>>>> I suspect this situation where we recursively call add_thread_field is >>>>> unexpected, >>>>> which may trigger your double-free here. >>>>> >>>>> Will investigate more. >>>>> >>>>> Can you try with the attached patch applied ? >>>>> >>>>> Here is an updated v2 of the patch which tests the notrace tls counter >>>>> sooner (before evaluating >>>>> filter). Please let me know how it goes. >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>> >>>>> >>>>> >>>>> Shehab Y. Elsayed, MSc. >>>>> PhD Student >>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>> University of Toronto >>>>> E-mail: shehaby...@gmail.com >>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>> >>>>> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers < >>>>> mathieu.desnoy...@efficios.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed < >>>>>> shehaby...@gmail.com> wrote: >>>>>> >>>>>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1 >>>>>> >>>>>> I run the program as a normal user >>>>>> >>>>>> The glibc version I get by running "ldd --version" is 2.17 >>>>>> >>>>>> >>>>>> Can you reproduce the issue after you do this as root ? >>>>>> >>>>>> echo "-1" > /proc/sys/kernel/perf_event_paranoid >>>>>> >>>>>> Based on this documentation of the Linux kernel: >>>>>> >>>>>> Documentation/sysctl/kernel.txt: >>>>>> >>>>>> perf_event_paranoid: >>>>>> >>>>>> Controls use of the performance events system by unprivileged >>>>>> users (without CAP_SYS_ADMIN). The default value is 2. >>>>>> >>>>>> -1: Allow use of (almost) all events by all users >>>>>> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >>>>>> >=0: Disallow ftrace function tracepoint by users without >>>>>> CAP_SYS_ADMIN >>>>>> Disallow raw tracepoint access by users without CAP_SYS_ADMIN >>>>>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN >>>>>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mathieu >>>>>> >>>>>> >>>>>> >>>>>> Shehab Y. Elsayed, MSc. >>>>>> PhD Student >>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering >>>>>> University of Toronto >>>>>> E-mail: shehaby...@gmail.com >>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>>> >>>>>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers < >>>>>> mathieu.desnoy...@efficios.com> wrote: >>>>>> >>>>>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers < >>>>>>> mathieu.desnoy...@efficios.com> wrote: >>>>>>> >>>>>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed < >>>>>>> shehaby...@gmail.com> wrote: >>>>>>> >>>>>>> Yes, I tried with only one of those contexts and I still ran into >>>>>>> the same problem. >>>>>>> >>>>>>> What is the setting returned by >>>>>>> >>>>>>> cat /proc/sys/kernel/perf_event_paranoid >>>>>>> >>>>>>> on your system ? And do you run your test program as root or normal >>>>>>> user ? >>>>>>> >>>>>>> Please CC the mailing list on your reply. >>>>>>> >>>>>>> >>>>>>> I will also need to know what glibc version you have on your system. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Mathieu >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Mathieu >>>>>>> >>>>>>> >>>>>>> >>>>>>> Shehab Y. Elsayed, MSc. >>>>>>> PhD Student >>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer >>>>>>> Engineering >>>>>>> University of Toronto >>>>>>> E-mail: shehaby...@gmail.com >>>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>>>> >>>>>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers < >>>>>>> mathieu.desnoy...@efficios.com> wrote: >>>>>>> >>>>>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed < >>>>>>>> shehaby...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi Mathieu, >>>>>>>> >>>>>>>> Thank you very much for your reply. >>>>>>>> >>>>>>>> I manually built lttng-ust from source (commit #: >>>>>>>> 8a208943e21700211beee3ea64180a5a534c7d2a). >>>>>>>> >>>>>>>> This is how I set up the tracing session: >>>>>>>> 1- lttng create lu_ncb_8_native -o {path} >>>>>>>> 2- lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>>> mutex_lock_req >>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>>> mutex_lock_acq >>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>>> mutex_lock_trylock >>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_ >>>>>>>> mutex_lock_unlock >>>>>>>> 3- lttng add-context -u -t procname >>>>>>>> lttng add-context -u -t vpid >>>>>>>> lttng add-context -u -t pthread_id >>>>>>>> lttng add-context -u -t vtid >>>>>>>> lttng add-context -u -t ip >>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>>>> lttng add-context -u -t perf:thread:cycles >>>>>>>> lttng add-context -u -t perf:thread:instructions >>>>>>>> 4- lttng start >>>>>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so >>>>>>>> ./lu_ncb -p8 -n8096 -b32 >>>>>>>> 6- lttng stop >>>>>>>> 7- lttng destroy >>>>>>>> >>>>>>>> >>>>>>>> Can you reproduce if you remove the following contexts ? >>>>>>>> >>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles >>>>>>>> lttng add-context -u -t perf:thread:cycles >>>>>>>> lttng add-context -u -t perf:thread:instructions >>>>>>>> >>>>>>>> And if you only keep a single of those contexts ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Mathieu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Shehab Y. Elsayed, MSc. >>>>>>>> PhD Student >>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer >>>>>>>> Engineering >>>>>>>> University of Toronto >>>>>>>> E-mail: shehaby...@gmail.com >>>>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#> >>>>>>>> >>>>>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers < >>>>>>>> mathieu.desnoy...@efficios.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed < >>>>>>>>> shehaby...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hello All, >>>>>>>>> >>>>>>>>> I am trying to instrument a pthread application using the >>>>>>>>> provided pthread wrapper, but I sometimes run into a "Double free >>>>>>>>> or corruption error (fasttop)" error. >>>>>>>>> >>>>>>>>> >>>>>>>>> Please provide more information about the version of lttng-ust you >>>>>>>>> are using, and how you setup >>>>>>>>> your tracing session. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Mathieu >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Here is a description of what I have tried and noticed: >>>>>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes >>>>>>>>> works as expected. >>>>>>>>> 2- From my experiments, the problem happens (more frequently at >>>>>>>>> least) when adding performance counter contexts (I tried cycles, >>>>>>>>> cpu_cycles and instructions). >>>>>>>>> 3- I am testing using lu_ncb from splash3 benchmark suite after >>>>>>>>> setting LD_PRELOAD to use the pthread wrapper as described in the >>>>>>>>> LTTng documents. >>>>>>>>> 4- Here is the backtrace printed after exiting: >>>>>>>>> ======= Backtrace: ========= >>>>>>>>> /lib64/libc.so.6([Thread 0x7ffff5611700 (LWP 97229) exited] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_destroy_context+ >>>>>>>>> 0x35)[0x7ffff7471575] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_session_destroy+ >>>>>>>>> 0x21c)[0x7ffff747363c] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x1e906)[0x7ffff746d906] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>>>>> 0x9f)[0x7ffff746dccf] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>>>>> 0x9f)[0x7ffff746dccf] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+ >>>>>>>>> 0x9f)[0x7ffff746dccf] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_abi_exit+0x68)[ >>>>>>>>> 0x7ffff746ead8] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x191d3)[0x7ffff74681d3] >>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_exit+0x67)[ >>>>>>>>> 0x7ffff745ed57] >>>>>>>>> /lib64/ld-linux-x86-64.so.2(+0xf85a)[0x7ffff7dec85a] >>>>>>>>> /lib64/libc.so.6(+0x38a49)[0x7ffff6ca6a49] >>>>>>>>> /lib64/libc.so.6(+0x38a95)[0x7ffff6ca6a95] >>>>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401b51] >>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff6c8fb35] >>>>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401c44] >>>>>>>>> 5- Also, this is a backtrace I obtained from gdb: >>>>>>>>> #0 0x00007ffff6eac1d7 in raise () from /lib64/libc.so.6 >>>>>>>>> #1 0x00007ffff6ead8c8 in abort () from /lib64/libc.so.6 >>>>>>>>> #2 0x00007ffff6eebf07 in __libc_message () from /lib64/libc.so.6 >>>>>>>>> #3 0x00007ffff6ef3503 in _int_free () from /lib64/libc.so.6 >>>>>>>>> #4 0x00007ffff768ad25 in lttng_destroy_perf_counter_field ( >>>>>>>>> field=<optimized out>) at lttng-context-perf-counters.c:418 >>>>>>>>> #5 0x00007ffff767a575 in lttng_destroy_context ( >>>>>>>>> ctx=0x7ffff0011090) at lttng-context.c:278 >>>>>>>>> #6 0x00007ffff767c63c in _lttng_channel_unmap ( >>>>>>>>> lttng_chan=0x7ffff0010f40) at lttng-events.c:172 >>>>>>>>> #7 lttng_session_destroy (session=0x7ffff0000900) >>>>>>>>> at lttng-events.c:247 >>>>>>>>> #8 0x00007ffff7676906 in lttng_release_session ( >>>>>>>>> objd=<optimized out>) at lttng-ust-abi.c:601 >>>>>>>>> #9 0x00007ffff7676ccf in lttng_ust_objd_unref (id=1, >>>>>>>>> is_owner=<optimized out>) at lttng-ust-abi.c:216 >>>>>>>>> #10 0x00007ffff7676ccf in lttng_ust_objd_unref (id=2, >>>>>>>>> is_owner=<optimized out>) at lttng-ust-abi.c:216 >>>>>>>>> #11 0x00007ffff7676ccf in lttng_ust_objd_unref (id=id@entry=18, >>>>>>>>> is_owner=is_owner@entry=1) at lttng-ust-abi.c:216 >>>>>>>>> #12 0x00007ffff7677ad8 in objd_table_destroy () >>>>>>>>> at lttng-ust-abi.c:235 >>>>>>>>> #13 lttng_ust_abi_exit () at lttng-ust-abi.c:1002 >>>>>>>>> #14 0x00007ffff76711d3 in lttng_ust_cleanup (exiting=1) >>>>>>>>> at lttng-ust-comm.c:1807 >>>>>>>>> #15 0x00007ffff7667d57 in lttng_ust_exit () >>>>>>>>> at lttng-ust-comm.c:1874 >>>>>>>>> #16 0x00007ffff7dec85a in _dl_fini () >>>>>>>>> from /lib64/ld-linux-x86-64.so.2 >>>>>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/libc.so.6 >>>>>>>>> #19 0x0000000000401b51 in main (argc=<optimized out>, >>>>>>>>> argv=<optimized out>) at lu.c:368 >>>>>>>>> >>>>>>>>> Any ideas, why this happens and how to fix it? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Shehab >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> lttng-dev mailing list >>>>>>>>> lttng-dev@lists.lttng.org >>>>>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Mathieu Desnoyers >>>>>>>>> EfficiOS Inc. >>>>>>>>> http://www.efficios.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Mathieu Desnoyers >>>>>>>> EfficiOS Inc. >>>>>>>> http://www.efficios.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Mathieu Desnoyers >>>>>>> EfficiOS Inc. >>>>>>> http://www.efficios.com >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Mathieu Desnoyers >>>>>>> EfficiOS Inc. >>>>>>> http://www.efficios.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Mathieu Desnoyers >>>>>> EfficiOS Inc. >>>>>> http://www.efficios.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>>> >>>>> -- >>>>> Mathieu Desnoyers >>>>> EfficiOS Inc. >>>>> http://www.efficios.com >>>>> >>>> >>>> >>>> -- >>>> Mathieu Desnoyers >>>> EfficiOS Inc. >>>> http://www.efficios.com >>>> >>> >>> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com >> > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com >
_______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev