On 05/18/2015 08:16 PM, Don Zickus wrote: > I stumbled upon an AMD box that had the BIOS using a hardware counter. > Instead > of printing out a warning and continuing, it failed and blocked further perf > counter usage. > > Looking through the history, I found commit a5ebe0ba3dff had tweaked the rules > for a xen guest on an almost identical box and now changed the behaviour. > > Unfortunately the rules were tweaked incorrectly and will always lead to msr > failures even though the msrs are completely fine. > > What happens now is in arch/x86/kernel/cpu/perf_event.c::check_hw_exists: > > <snip> > for (i = 0; i < x86_pmu.num_counters; i++) { > reg = x86_pmu_config_addr(i); > ret = rdmsrl_safe(reg, &val); > if (ret) > goto msr_fail; > if (val & ARCH_PERFMON_EVENTSEL_ENABLE) { > bios_fail = 1; > val_fail = val; > reg_fail = reg; > } > } > > <snip> > /* > * Read the current value, change it and read it back to see if it > * matches, this is needed to detect certain hardware emulators > * (qemu/kvm) that don't trap on the MSR access and always return 0s. > */ > reg = x86_pmu_event_addr(0); > ^^^^ > > if the first perf counter is enabled, then this routine will always fail > because the counter is running. :-( > > if (rdmsrl_safe(reg, &val)) > goto msr_fail; > val ^= 0xffffUL; > ret = wrmsrl_safe(reg, val); > ret |= rdmsrl_safe(reg, &val_new); > if (ret || val != val_new) > goto msr_fail; > > The above bios_fail used to be a 'goto' which is why it worked in the past. > > Further, most vendors have migrated to using fixed counters to hide their > evilness hence this problem rarely shows up now days except on a few old > boxes. > > I fixed my problem and kept the spirit of the original Xen fix, by recording a > safe non-enable register to be used safely for the reading/writing check. > Because it is not enabled, this passes on bare metal boxes (like metal), but > should continue to throw an msr_fail on Xen guests because the register isn't > emulated yet. > > Now I get a proper bios_fail error message and Xen should still see their > msr_fail message (untested). > > Signed-off-by: Don Zickus <dzic...@redhat.com>
Right -- so what was actually broken was the "does this register work" check, which needs a non-enabled register. Would it make sense to add a comment somewhere in the code saying that you need a disabled event counter for the MSR check to work properly? It's sort of implied but it's not explicit. Other than that, this looks good to me. I'm not positive I have access to the box I needed this for anymore -- I'll take a look for it next week. In the mean time: Acked-by: George Dunlap <george.dun...@eu.citrix.com> > --- > arch/x86/kernel/cpu/perf_event.c | 16 +++++++++++++++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event.c > b/arch/x86/kernel/cpu/perf_event.c > index 1ee7b19..e4dc4e5 100644 > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -186,6 +186,7 @@ static bool check_hw_exists(void) > u64 val, val_fail, val_new= ~0; > int i, reg, reg_fail, ret = 0; > int bios_fail = 0; > + int reg_safe = -1; > > /* > * Check to see if the BIOS enabled any of the counters, if so > @@ -200,6 +201,8 @@ static bool check_hw_exists(void) > bios_fail = 1; > val_fail = val; > reg_fail = reg; > + } else { > + reg_safe = i; > } > } > > @@ -218,11 +221,22 @@ static bool check_hw_exists(void) > } > > /* > + * If all the counters are enabled, the below test will always > + * fail. The tools will also become useless in this scenario. > + * Just fail and disable the hardware counters. > + */ > + > + if (reg_safe == -1) { > + reg = reg_safe; > + goto msr_fail; > + } > + > + /* > * Read the current value, change it and read it back to see if it > * matches, this is needed to detect certain hardware emulators > * (qemu/kvm) that don't trap on the MSR access and always return 0s. > */ > - reg = x86_pmu_event_addr(0); > + reg = x86_pmu_event_addr(reg_safe); > if (rdmsrl_safe(reg, &val)) > goto msr_fail; > val ^= 0xffffUL; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/