On Tue, Jun 25, 2019 at 11:27 AM Thomas Gleixner <t...@linutronix.de> wrote: > > On Tue, 25 Jun 2019, Thomas Gleixner wrote: > > > On Tue, 25 Jun 2019, Vincenzo Frascino wrote: > > > > CC+ Andy > > > > > do_hres() in the vDSO generic library masks the hw counter value > > > immediately after reading it. > > > > > > Postpone the mask application after checking if the syscall fallback is > > > enabled, in order to be able to detect a possible fallback for the > > > architectures that have masks smaller than ULLONG_MAX. > > > > Right. This only worked on x86 because the mask is there ULLONG_MAX for all > > VDSO capable clocksources, i.e. that ever worked just by chance. > > > > As we talked about that already yesterday, I tested this on a couple of > > machines and as expected the outcome is uarch dependent. Minimal deviations > > to both sides and some machines do not show any change at all. I doubt it's > > possible to come up with a solution which makes all uarchs go faster > > magically. > > > > Though, thinking about it, we could remove the mask operation completely on > > X86. /me runs tests > > Unsurprisingly the results vary. Two uarchs do not care, but they did not > care about moving the mask either. The other two gain performance and the > last one falls back to the state before moving the mask. So in general it > looks like a worthwhile optimization. >
At one point, I contemplated a different approach: have the "get the counter" routine return 0 and then do if (unlikely(cycles <= last)) goto fallback. This will remove one branch from the hot path. I got dubious results when I tried benchmarking it, probably because the branch in question was always correctly predicted.