On 05/21/2015 06:57 PM, George Dunlap wrote: > On 05/18/2015 08:16 PM, Don Zickus wrote: >> I stumbled upon an AMD box that had the BIOS using a hardware counter. >> Instead >> of printing out a warning and continuing, it failed and blocked further perf >> counter usage. >> >> Looking through the history, I found commit a5ebe0ba3dff had tweaked the >> rules >> for a xen guest on an almost identical box and now changed the behaviour. >> >> Unfortunately the rules were tweaked incorrectly and will always lead to msr >> failures even though the msrs are completely fine. >> >> What happens now is in arch/x86/kernel/cpu/perf_event.c::check_hw_exists: >> >> <snip> >> for (i = 0; i < x86_pmu.num_counters; i++) { >> reg = x86_pmu_config_addr(i); >> ret = rdmsrl_safe(reg, &val); >> if (ret) >> goto msr_fail; >> if (val & ARCH_PERFMON_EVENTSEL_ENABLE) { >> bios_fail = 1; >> val_fail = val; >> reg_fail = reg; >> } >> } >> >> <snip> >> /* >> * Read the current value, change it and read it back to see if it >> * matches, this is needed to detect certain hardware emulators >> * (qemu/kvm) that don't trap on the MSR access and always return 0s. >> */ >> reg = x86_pmu_event_addr(0); >> ^^^^ >> >> if the first perf counter is enabled, then this routine will always fail >> because the counter is running. :-( >> >> if (rdmsrl_safe(reg, &val)) >> goto msr_fail; >> val ^= 0xffffUL; >> ret = wrmsrl_safe(reg, val); >> ret |= rdmsrl_safe(reg, &val_new); >> if (ret || val != val_new) >> goto msr_fail; >> >> The above bios_fail used to be a 'goto' which is why it worked in the past. >> >> Further, most vendors have migrated to using fixed counters to hide their >> evilness hence this problem rarely shows up now days except on a few old >> boxes. >> >> I fixed my problem and kept the spirit of the original Xen fix, by recording >> a >> safe non-enable register to be used safely for the reading/writing check. >> Because it is not enabled, this passes on bare metal boxes (like metal), but >> should continue to throw an msr_fail on Xen guests because the register isn't >> emulated yet. >> >> Now I get a proper bios_fail error message and Xen should still see their >> msr_fail message (untested). >> >> Signed-off-by: Don Zickus <dzic...@redhat.com> > > Right -- so what was actually broken was the "does this register work" > check, which needs a non-enabled register. > > Would it make sense to add a comment somewhere in the code saying that > you need a disabled event counter for the MSR check to work properly? > It's sort of implied but it's not explicit. > > Other than that, this looks good to me. I'm not positive I have access > to the box I needed this for anymore -- I'll take a look for it next week. > > In the mean time: > > Acked-by: George Dunlap <george.dun...@eu.citrix.com>
I managed to track down the machine that had the problem and verify that things still work for me after this patch. So now you can add: Tested-by: George Dunlap <george.dun...@eu.citrix.com> Thanks, -George -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/