Re: [RFCv2 00/48] perf tools: Add threads to record command

Alexey Budankov Mon, 24 Sep 2018 12:12:30 -0700

Hi,

On 24.09.2018 21:32, Alexey Budankov wrote:
> Hi,
> 
> On 24.09.2018 17:29, Jiri Olsa wrote:
>> On Mon, Sep 24, 2018 at 04:09:09PM +0300, Alexey Budankov wrote:
>>> Hi,
>>>
>>> On 24.09.2018 10:02, Alexey Budankov wrote:
>>>> Hi,
>>>>
>>>> On 23.09.2018 22:30, Jiri Olsa wrote:
>>>>> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
>>>>>
>>>>> SNIP
>>>>>
>>>>>> Events:
>>>>>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>>>>>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>>>>>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>>>>>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>>>>>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>>>>>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>>>>>
>>>>>> =================================================
>>>>>>
>>>>>> Command:
>>>>>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record 
>>>>>> --threads=T \
>>>>>>  -a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>>>>         -e cpu/period=P,event=0x3c/Duk,\
>>>>>>            cpu/period=P,umask=0x3/Duk,\
>>>>>>            cpu/period=P,event=0xc0/Duk,\
>>>>>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>>>>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>>>>
>>>>> hum, so I guess the results suck because of the -a option,
>>>>> getting extra samples for all the perf record threads
>>>>>
>>>>> could you try without the -a? you monitor only user events,
>>>>> so you're interested only in ./matrix.* samples, right?
>>>>
>>>> Ok, trying without -a, in per-process mode. 
>>>
>>> Command:
>>>
>>> /usr/bin/time ./perf.thr record --threads=T \
>>>     -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>     -e cpu/period=P,event=0x3c/Duk,\
>>>        cpu/period=P,umask=0x3/Duk,\
>>>        cpu/period=P,event=0xc0/Duk,\
>>>        cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
>>>        cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
>>>        cpu/period=0x11171,event=0xc2,umask=0x40/uk \
>>>     --clockid=monotonic_raw -- ./matrix.gcc
>>>
>>> Workload: matrix multiplication in 128 threads
>>>
>>> T : 272
>>>     P (period, ms)       : 0.35 
>>>     runtime overhead (%) : 13x ~ 87.73 / 6.81
>>
>> how do you meassure this?
> 
> This is the ratio of elapsed times:
> runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
> i.e.
> 
> /usr/bin/time ./matrix.gcc
> ...
> 767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 
> 100756maxresident)k
> 88inputs+0outputs (0major+139898minor)pagefaults 0swaps
> 
> so elapsed_time = 6.81 sec
> 
> elapsed_time_uder_profiling is elapsed value from output of 
> 
> /usr/bin/time ./perf.thr record --threads=T ...
> 
>>
>>>     data loss (%)        : 0
>>>     LOST events          : 36
>>>     SAMPLE events        : 8048542
>>>         perf.data size (GiB) : 10
>>
>> any idea why does it have some much more samples?
> 
> Presumably, this is because period is 350us and this is the smallest 
> one that perf.thr manages to capture data without data loss (=0) when T=272.
> However, during collection, I get message that max sampling frequency 
> is lowered to 3KHz.


Lowering default frequency rate to 3000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.

Thanks,
Alexey

> 
> Thanks,
> Alexey
> 
>>
>> thanks,
>> jirka
>>
>

Re: [RFCv2 00/48] perf tools: Add threads to record command

Reply via email to