It's great to see improvements to our calibration of the TSC (and I tend to agree that cpu_counter should be serializing, so that, e.g., cpu_counter(); ...; cpu_counter() reliably measures time taken in the ellipsis).
At the same time, I wonder whether we should _also_: 1. Modify the tsc timecounter so that it uses a global atomic to ensure that there is a global view of time as counted by the tsc. This is what the timecounter(9) API per se expects of timecounters, and right now tsc (along with various other per-CPU cycle counters) fails to guarantee that. 2. Introduce an API for a local timecounter -- a per-CPU timecounter that never goes backwards on a single CPU, but: (a) measures units of wall clock time, unlike cpu_counter(); (b) need not be synchronized between CPUs; and (c) may be cheaper to read than a global timecounter. We could then use that for, e.g., rusage calculations.
