Hi, On 29.08.2018 11:40, Alexey Budankov wrote: > > Hi Kim, > > On 28.08.2018 22:43, Kim Phillips wrote: >> On Tue, 28 Aug 2018 18:44:57 +0300 >> Alexey Budankov <alexey.budan...@linux.intel.com> wrote: >> >>> Experiment with profiling matrix multiplication code executing 128 >>> threads on Intel Xeon Phi (KNM) with 272 cores, like below, >>> demonstrates data loss metrics value of 98%: >> >> So I took these two patches for a quick test-drive on intel-pt. > > Thanks for testing that out in this scenario! It hasn't been tested yet. > >> >> BEFORE (acme's today's perf/core branch): >> >> $ sudo perf version >> perf version 4.18.rc7.g55fc647 >> $ sudo perf record -e intel_pt// dd if=/dev/zero of=/dev/null count=100000 >> 100000+0 records in >> 100000+0 records out >> 51200000 bytes (51 MB, 49 MiB) copied, 0.0868081 s, 590 MB/s >> [ perf record: Woken up 21 times to write data ] >> [ perf record: Captured and wrote 41.302 MB perf.data ] >> $ >> >> AFTER (== BEFORE + these two patches): >> >> $ sudo ./perf version >> perf version 4.18.rc7.gbc1c99 >> $ sudo perf record -e intel_pt// dd if=/dev/zero of=/dev/null count=100000 >> 100000+0 records in >> 100000+0 records out >> 51200000 bytes (51 MB, 49 MiB) copied, 0.0931142 s, 550 MB/s >> >> ...and it's still running, minutes afterwards. Before I kill it, >> here's some strace output: >> >> nanosleep({tv_sec=0, tv_nsec=500000}, NULL) = 0 >> lseek(3, 332556518, SEEK_SET) = 332556518 >> write(3, "D\0\0\0\0\0\10\0", 8) = 8 >> lseek(3, 0, SEEK_CUR) = 332556526 >> futex(0x7f221e7252c8, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252cc, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252c8, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252cc, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252c8, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252cc, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252c8, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> nanosleep({tv_sec=0, tv_nsec=500000}, NULL) = 0 >> lseek(3, 332578462, SEEK_SET) = 332578462 >> write(3, "D\0\0\0\0\0\10\0", 8) = 8 >> lseek(3, 0, SEEK_CUR) = 332578470 >> futex(0x7f221e7252cc, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> nanosleep({tv_sec=0, tv_nsec=500000}, NULL) = 0 >> lseek(3, 332598822, SEEK_SET) = 332598822 >> write(3, "D\0\0\0\0\0\10\0", 8) = 8 >> lseek(3, 0, SEEK_CUR) = 332598830 >> futex(0x7f221e7252c8, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252cc, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252c8, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252cc, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252c8, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e725200, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7f221e7252cc, FUTEX_WAKE_PRIVATE, 1) = 1 >> ^Cstrace: Process 3597 detached >> >> I can't prove that it's these two patches that create the hang, but >> this does look like a livelock situation...hm, hitting ^C doesn't stop >> it...had to kill -9 it...erm, does 'perf record -e intel_pt// dd...' >> work for you on a more standard machine?: >> >> $ dmesg | grep Perf >> [ 0.044226] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, >> full-width counters, Intel PMU driver. > > Strace patterns look similar to the ones implemented in the patches. > Let me reproduce and investigate the hang locally.
The stack: Thread 1 (Thread 0x7f170ef4a7c0 (LWP 11182)): #0 0x00007f170e2dc350 in nanosleep () from /lib64/libpthread.so.0 #1 0x00000000004436e4 in record__mmap_read_sync (trace_fd=3, cblocks=0x275c120, cblocks_size=6, rec=0xabe1c0 <record>) at builtin-record.c:561 #2 0x0000000000443c71 in record__mmap_read_evlist (rec=0xabe1c0 <record>, evlist=0x274f4f0, overwrite=false) at builtin-record.c:671 #3 0x0000000000443d21 in record__mmap_read_all (rec=0xabe1c0 <record>) at builtin-record.c:692 #4 0x0000000000444feb in __cmd_record (rec=0xabe1c0 <record>, argc=4, argv=0x7fffda7606f0) at builtin-record.c:1183 #5 0x00000000004464dc in cmd_record (argc=4, argv=0x7fffda7606f0) at builtin-record.c:2000 #6 0x00000000004cee9e in run_builtin (p=0xace5f8 <commands+216>, argc=8, argv=0x7fffda7606f0) at perf.c:302 #7 0x00000000004cf10b in handle_internal_command (argc=8, argv=0x7fffda7606f0) at perf.c:354 #8 0x00000000004cf25a in run_argv (argcp=0x7fffda76054c, argv=0x7fffda760540) at perf.c:398 #9 0x00000000004cf5c6 in main (argc=8, argv=0x7fffda7606f0) at perf.c:520 The main thread continue consuming kernel buffers even after the application finishes: Using CPUID GenuineIntel-6-5E mmap size 528384B sync: sched: cblock 0x7fd1d62f0088 size 192 sync: sched: cblock 0x7fd1d63102c8 size 192 sync: sched: cblock 0x7fd1d63203e8 size 288 sync: sched: cblock 0x7fd1d6360868 size 1024 sync: done: cblock 0x7fd1d62f0088 sync: done: cblock 0x7fd1d63102c8 sync: done: cblock 0x7fd1d63203e8 sync: done: cblock 0x7fd1d6360868 ... sync: done: cblock 0x7fd1d62f0088 sync: done: cblock 0x7fd1d63102c8 sync: sched: cblock 0x7fd1d62f0088 size 192 sync: sched: cblock 0x7fd1d63102c8 size 192 sync: done: cblock 0x7fd1d62f0088 sync: done: cblock 0x7fd1d63102c8 sync: sched: cblock 0x7fd1d62f0088 size 192 sync: sched: cblock 0x7fd1d63102c8 size 192 sync: done: cblock 0x7fd1d62f0088 sync: done: cblock 0x7fd1d63102c8 sync: sched: cblock 0x7fd1d62f0088 size 192 sync: sched: cblock 0x7fd1d63102c8 size 192 sync: sched: cblock 0x7fd1d6360868 size 128 100000+0 records in 100000+0 records out 51200000 bytes (51 MB, 49 MiB) copied, 0.0622905 s, 822 MB/s sig: 17 sync: done: cblock 0x7fd1d62f0088 sync: done: cblock 0x7fd1d63102c8 sync: sched: cblock 0x7fd1d62f0088 size 192 sync: sched: cblock 0x7fd1d63001a8 size 192 sync: sched: cblock 0x7fd1d63102c8 size 192 sync: sched: cblock 0x7fd1d6360868 size 416 sync: done: cblock 0x7fd1d62f0088 sync: done: cblock 0x7fd1d63001a8 sync: done: cblock 0x7fd1d63102c8 sync: done: cblock 0x7fd1d6360868 sync: sched: cblock 0x7fd1d62f0088 size 192 sync: sched: cblock 0x7fd1d63001a8 size 192 sync: sched: cblock 0x7fd1d63102c8 size 192 ... > > Thanks, > Alexey > >> >> Thanks, >> >> Kim >> >