* Marty McFadden <mcfadd...@llnl.gov> wrote: > > This patch addresses the following two problems: > 1. The current msr module grants all-or-nothing access to MSRs, > thus making user-level runtime performance adjustments > problematic, particularly for power-constrained HPC systems. > > 2. The current msr module requires a separate system call and the > acquisition of the preemption lock for each individual MSR access. > This overhead degrades performance of runtime tools that would > ideally sample multiple MSRs at high frequencies.
No, we really don't want to touch the old MSR code - it's a very opaque API with various deep limitations. What I'd like to see instead is to use a modern system monitoring interface - and in fact that already happened in the last kernel release, we added the perf MSR access methods via: commit b7b7c7821d932ba18ef6c8eafc8536066b4c2ef4 Author: Andy Lutomirski <l...@kernel.org> Date: Mon Jul 20 11:49:06 2015 -0400 perf/x86: Add an MSR PMU driver This patch adds an MSR PMU to support free running MSR counters. Such as time and freq related counters includes TSC, IA32_APERF, IA32_MPERF and IA32_PPERF, but also SMI_COUNT. The events are exposed in sysfs for use by perf stat and other tools. The files are under /sys/devices/msr/events/ see arch/x86/cpu/perf/msr.c, or arch/x86/events/msr.c in the latest perf tree: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core For example with the perf ABIs 'batch access' of a group of MSRs is easy: a group of events can be read or sampled at once. It can be done in a system-wide, per task or per task hierarchy fashion, with cgroup management as well - it's a modern API. Right now the MSR PMU code is only at its first version, with only these few MSRs exposed: enum perf_msr_id { PERF_MSR_TSC = 0, PERF_MSR_APERF = 1, PERF_MSR_MPERF = 2, PERF_MSR_PPERF = 3, PERF_MSR_SMI = 4, PERF_MSR_EVENT_MAX, }; but that can (and should) be expanded and more features can be added. Thanks, Ingo