Hi, On 28.08.2018 14:58, Alexey Budankov wrote: > Hi Andi, > > On 28.08.2018 11:59, Jiri Olsa wrote: >> On Mon, Aug 27, 2018 at 08:03:21PM +0300, Alexey Budankov wrote: >>> >>> Currently in record mode the tool implements trace writing serially. >>> The algorithm loops over mapped per-cpu data buffers and stores ready >>> data chunks into a trace file using write() system call. >>> >>> At some circumstances the kernel may lack free space in a buffer >>> because the other buffer's half is not yet written to disk due to >>> some other buffer's data writing by the tool at the moment. >>> >>> Thus serial trace writing implementation may cause the kernel >>> to loose profiling data and that is what observed when profiling >>> highly parallel CPU bound workloads on machines with big number >>> of cores. >>> >>> Experiment with profiling matrix multiplication code executing 128 >>> threads on Intel Xeon Phi (KNM) with 272 cores, like below, >>> demonstrates data loss metrics value of 98%: >>> >>> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \ >>> --call-graph dwarf,1024 --user-regs=IP,SP,BP \ >>> --switch-events -e >>> cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \ >>> matrix.gcc >>> >>> Data loss metrics is the ratio lost_time/elapsed_time where >>> lost_time is the sum of time intervals containing PERF_RECORD_LOST >>> records and elapsed_time is the elapsed application run time >>> under profiling. >> >> I like the idea and I think it's good direction to go, but could >> you please share some from perf stat or whatever you used to meassure >> the new performance? > > Is it ok to share VTune GUI screenshots I sent you the last time > to demonstrate the advantage of AIO trace streaming?
VTune release manager permitted to share it, well, sorry for bothering. > > Thanks, > Alexey > > >> >> thanks, >> jirka >> >