On 11/27/2018 8:25 PM, Stephane Eranian wrote:
On Tue, Nov 27, 2018 at 3:36 PM Andi Kleen <a...@linux.intel.com> wrote:

It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
rather limited use (or even negative, in our case) to a counter that's
already restricted to ring 3.

It's much faster. The PMI cost goes down dramatically.

I still the the right fix is to add an perf event opt-out and let it be
used by rr.

    V3 is without counter freezing.
     V4 is with counter freezing.
     The value is the average cost of the PMI handler.
     (lower is better)

     perf options    `           V3(ns) V4(ns)  delta
     -c 100000                   1088   894     -18%
     -g -c 100000                1862   1646    -12%
     --call-graph lbr -c 100000  3649   3367    -8%
     --c.g. dwarf -c 100000      2248   1982    -12%

Is that measured on the same machine, i.e., do you force V3 on Skylake?

Yes, it's measured on same Kabylake machine with counter_freezing option disabled/enabled.


All it does, I think, is save one wrmsr(GLOBAL_CTLR) on entry to the
PMU interrupt handler or am I missing something?
Or does it save two? The wrmsr(GLOBAL_CTRL) at the end to reactivate.

__intel_pmu_disable_all() and __intel_pmu_enable_all() are not called in V4 handler. So save at least two wrmsrl.

Thanks,
Kan


Reply via email to