On Fri, 2026-03-06 at 16:15 +0100, Tomas Glozar wrote:
> Ășt 3. 3. 2026 v 4:21 odesĂ­latel Crystal Wood <[email protected]> napsal:
> > > 1. The wake-up timers are set to absolute time, and are incremented by
> > > "period" (once or multiple times, if the timer is significantly
> > > delayed) each cycle. What can be done as an alternative to what v1
> > > does is this: record the current time when starting the timerlat
> > > tracer (I need to reset align_next to zero anyway even with the v1
> > > design, that is a bug in the patch), and increment from that.
> > 
> > I was only talking about doing this for the initial expiration, not on
> > increment.
> > 
> 
> Ah, I misread "period" as "relative period", since the initial
> absolute period is determined from the current time in the first cycle
> of each thread.

I did mean relative period ("absolute period" isn't a phrase that makes
sense to me; that's just an expiration); I just forgot to make the
"now +" part explicit.

> > > 2. "cpu" makes a poor thread ID here. If my period is 1000us, and I
> > > run on CPUs 0 and 100 with alignment 10, suddenly, the space between
> > > the threads becomes 1000us, which is equivalent to 0us. I would need
> > > to go through the cpuset and assign numbers from 0 to n to each CPU.
> > > That would guarantee a fixed spacing of the threads independent of
> > > when the threads wake up in the first cycle (unlike the v1 design),
> > > but it would make the implementation more complex, since I would have
> > > to store the numbers.
> > 
> > Right, I was thinking of just a few CPUs missing not being a big deal,
> > but on big systems with only a few CPUs running the test it does
> > matter.
> > 
> 
> My point is mostly that the spacing of the threads shouldn't depend on
> the CPU numbers. One has to make sure the alignment doesn't overflow
> the period anyway if they want to have completely time-isolated
> wake-ups.

Yeah, I underestimated how easy it would be for that to be a problem.  I
think this is a good enough reason to use the atomic approach.

> The while loop is designed only to handle "small" time differences,
> with respect to the relative period. When using timerlat manually with
> a user workload, it might take the user a few seconds/seconds/hours
> before they start the user process (typing the command line, or if the
> user e.g. has a snack or coffee in between), which has to be corrected
> by the while loop. This does not interact well with a low period.
> Consider the following scenario, assuming the initial absolute period
> is set on timerlat tracer enablement:
> 
> 1. The user enables the timerlat tracer with NO_OSNOISE_WORKLOAD and
> 100us period.
> 2. The user steps away for 1 hour.
> 3. After 10 seconds, tlat->abs_period is 3 600 000 000us in the past.
> The while loop starts incrementing tlat->abs_period by 100us, taking
> 36 000 000 loops. If one iteration takes 10 CPU cycles on a 1GHz CPU,
> the while loop itself will take 360us (which is >100us).
> 4. The timerlat thread never wakes up, since the wake-up time even
> after the correction is in the past.

(assuming you meant "after one hour" instead of 10 seconds)

Oh, I see it doesn't update "now" inside the loop.  Arming a timer for
the past should cause it to fire immediately though; otherwise there
would be all sorts of nasty races.

In any case, regardless of what we do with alignment, we could simplify
by replacing the while loop with hrtimer_forward_now() which does
division (if necessary) instead of a loop, and get rid of
tlat->abs_period (the handler can call hrtimer_get_expires()).

> This is much more reasonable than the user stopping the thread inside
> the while loop. Actually, this scenario can already happen without
> alignment, since the scheduler might preempt the thread inside the
> while loop for more than the period - but that is a separate issue.
> 
> Also, the current implementation is relatively simple (and hopefully
> also easy to understand with the comments in v2), so my idea is that
> we can use it for now, and if we want deterministing alignment in the
> future, we can always improve it.

The comments do help, thanks.  v1 took me a few tries to figure out,
given how different it is from usual cmpxchg usage.  v2 looks good to
me.

> > > I used it in cyclictest to measure the overhead of a large number of
> > > threads waking up at the same time. Similarly, a non-zero alignment
> > > will get rid of most of that overhead. Without alignment set, the
> > > thread wake-ups offsets are semi-random, depending on how the threads
> > > wake up, which might lead to inconsistent results where one run has
> > > good numbers and another run bad numbers, since the alignment is
> > > determined in the first cycle.
> > 
> > OK, I was viewing the staggering as the main point, but I see how the
> > alignment itself helps too.
> > 
> > Is there a use case for not always doing the alignment?
> > Other than people asking why their numbers suddenly got worse...
> > 
> 
> Yes - to simulate the default behavior of cyclictest without
> -A/--aligned, and of multi-thread cyclic workloads that do not align
> their threads respectively. Even if we wanted to always use the
> alignment, it should not default to 0 IMHO, so that users don't see a
> degradation like you mention.

It just feels a bit weird to preserve "maybe they're bunched up, maybe
not, who knows?" as a feature (much less the default), especially for a
tool meant to help with determinism.

> > Why can't we just make tlat->abs_period and every other time variable
> > in this file be ktime_t?  Other than atomic stuff if we do go that
> > route.
> > 
> > Not saying that that should hold up this patch,  just an idea to simplify 
> > things.
> > 
> 
> The reason for that is that the code does arithmetic directly on the
> ns unit form of the time, without the need to use ktime_add(). I don't
> see anything on top of that.  I see ktime_add(x, y) is just x + y
> nowadays, so that would work.

We should still use ktime_add() etc. in order to be nice users of the
interface.  I just think the conversions are worse clutter than a
couple ktime add/sub calls.

-Crystal


Reply via email to