CPU timestamps correlation for relating OA samples with system events

Sagar Arun Kamble Mon, 25 Dec 2017 21:33:01 -0800


On 12/22/2017 3:46 PM, Lionel Landwerlin wrote:

On 22/12/17 09:30, Sagar Arun Kamble wrote:
On 12/21/2017 6:29 PM, Lionel Landwerlin wrote:
Some more findings I made while playing with this series & GPUTop.
Turns out the 2ms drift per second is due to timecounter. Adding thedelta this way :
https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607

Eliminates the drift.
I see two imp. changes 1. approximation of start time duringinit_timecounter 2. overflow handling in delta accumulation.With these incorporated, I guess timecounter should also work in samefashion.
I think the arithmetic in timecounter is inherently lossy and that'swhy we're seeing a drift.

Could you share details about platform, scenario in which 2ms drift persecond is being seen with timecounter.

I did not observe this on SKL.

Could we be using it wrong?

if we use two changes highlighted above with timecounter maybe we willget same results as your current implementation.

In the patch above, I think there is still a drift because of thepotential fractional part loss at every delta we add.But it should only be a fraction of a nanosecond multiplied by thenumber of reports over a period of time.With a report every 1us, that should still be much less than a 1ms ofdrift over 1s.

timecounter interface takes care of fractional parts so that should help us.

we can either go with timecounter or our own implementation providedconversions are precise.

We can probably do better by always computing the clock using theentire delta rather than the accumulated delta.

issue is that the reported clock cycles in the OA report is 32bits LSBof GPU TS whereas counter is 36bits. Hence we will need toaccumulate the delta. ofc there is assumption that two reports can't bespaced with count value of 0xffffffff apart.

Timelines of perf i915 tracepoints & OA reports now make a lot moresense.
There is still the issue that reading the CPU clock & the RCStimestamp is inherently not atomic. So there is a delta there.I think we should add a new i915 perf record type to express thedelta that we measure this way :
https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475
So that userspace knows there might be a global offset between the 2times and is able to present it.
agree on this. Delta ns1-ns0 can be interpreted as max drift.
Measurement on my KBL system were in the order of a few microseconds(~30us).I guess we might be able to setup the correlation point better(masking interruption?) to reduce the delta.
already using spin_lock. Do you mean NMI?
I don't actually know much on this point.
if spin_lock is the best we can do, then that's it :)
Thanks,

-
Lionel


On 07/12/17 00:57, Robert Bragg wrote:
On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <rob...@sixbynine.org<mailto:rob...@sixbynine.org>> wrote:
    at least from what I wrote back then it looks like I was seeing
    a drift of a few milliseconds per second on SKL. I vaguely
    recall it being much worse given the frequency constants we had
    for Haswell.
Sorry I didn't actually re-read my own message properly beforereferencing it :) Apparently the 2ms per second drift was forHaswell, so presumably not quite so bad for SKL.
- Robert



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events

Reply via email to