On Mon, 2026-03-02 at 09:48 +0100, Tomas Glozar wrote:
> so 28. 2. 2026 v 0:50 odesÃlatel Crystal Wood <[email protected]> napsal:
> > > Add an option called TIMERLAT_ALIGN to osnoise/options, together with a
> > > corresponding setting osnoise/timerlat_align_us.
> > >
> > > This option sets the alignment of wakeup times between different
> > > timerlat threads, similarly to cyclictest's -A/--aligned option. If
> > > TIMERLAT_ALIGN is set, the first thread that reaches the first cycle
> > > records its first wake-up time. Each following thread sets its first
> > > wake-up time to a fixed offset from the recorded time, and incremenets
> > > it by the same offset.
> >
> > Why not just set the initial timer expiration to be
> > "period + cpu * align_us"? Then you wouldn't need any interaction
> > between CPUs.
>
> "period + cpu * align_us" wouldn't quite do it, for two reasons:
>
> 1. The wake-up timers are set to absolute time, and are incremented by
> "period" (once or multiple times, if the timer is significantly
> delayed) each cycle. What can be done as an alternative to what v1
> does is this: record the current time when starting the timerlat
> tracer (I need to reset align_next to zero anyway even with the v1
> design, that is a bug in the patch), and increment from that.
I was only talking about doing this for the initial expiration, not on
increment.
> 2. "cpu" makes a poor thread ID here. If my period is 1000us, and I
> run on CPUs 0 and 100 with alignment 10, suddenly, the space between
> the threads becomes 1000us, which is equivalent to 0us. I would need
> to go through the cpuset and assign numbers from 0 to n to each CPU.
> That would guarantee a fixed spacing of the threads independent of
> when the threads wake up in the first cycle (unlike the v1 design),
> but it would make the implementation more complex, since I would have
> to store the numbers.
Right, I was thinking of just a few CPUs missing not being a big deal,
but on big systems with only a few CPUs running the test it does
matter.
Instead of assigning numbers, could you just loop over each CPU's
tlat->abs_period and set the initial expiration with appropriate
offset (prior to starting any of the threads)? Then the thread would
not need to care about anything other than the usual increment.
> If I implemented both of those ideas, the interaction between the CPUs
> can indeed be gotten rid of. I'm not sure if it is a better solution,
> though. Another motivation of recording the first thread wake-up was
> that when using user threads, the first thread might be created some
> time after the tracer is enabled, and I did not want to have a large
> gap that would have to be corrected by the while loop at the end of
> wait_next_period().
What is the actual concerning impact here?
If we want to be really paranoid we could check for pending signals
during the loop, in case userspace delayed so long (with a very short
period) that the user has a hard time with ctrl+c and such... but that
could happen already if userspace does something silly (e.g. stopped
by a debugger?) between loops.
> > > kernel/trace/trace_osnoise.c | 34 +++++++++++++++++++++++++++++++++-
> > > 1 file changed, 33 insertions(+), 1 deletion(-)
> >
> > Documentation needs to be updated as well.
> >
> > Should mention that updating align_us while the timer is running won't
> > take effect immediately (unlike period, which does).
> >
>
> Good idea, thanks! In general, I'm not expecting the user to change
> timerlat parameters during a measurement - but it is supported, and
> should be documented.
Maybe better to phrase it as "not guaranteed to take effect
immediately" :-)
> > > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> > > index dee610e465b9..df1d4529d226 100644
> > > --- a/kernel/trace/trace_osnoise.c
> > > +++ b/kernel/trace/trace_osnoise.c
> > > @@ -58,6 +58,7 @@ enum osnoise_options_index {
> > > OSN_PANIC_ON_STOP,
> > > OSN_PREEMPT_DISABLE,
> > > OSN_IRQ_DISABLE,
> > > + OSN_TIMERLAT_ALIGN,
> > > OSN_MAX
> > > };
> > >
> > > @@ -66,7 +67,8 @@ static const char * const osnoise_options_str[OSN_MAX]
> > > = {
> > > "OSNOISE_WORKLOAD",
> > > "PANIC_ON_STOP",
> > >
> > > "OSNOISE_PREEMPT_DISABLE",
> > > -
> > > "OSNOISE_IRQ_DISABLE" };
> > > +
> > > "OSNOISE_IRQ_DISABLE",
> > > + "TIMERLAT_ALIGN" };
> >
> > Do we really need a flag for this, or can we just interpret a non-zero
> > align_us value as enabling the feature?
> >
>
> Yes, we need a flag for this, because a zero alignment is a common use case.
>
> I used it in cyclictest to measure the overhead of a large number of
> threads waking up at the same time. Similarly, a non-zero alignment
> will get rid of most of that overhead. Without alignment set, the
> thread wake-ups offsets are semi-random, depending on how the threads
> wake up, which might lead to inconsistent results where one run has
> good numbers and another run bad numbers, since the alignment is
> determined in the first cycle.
OK, I was viewing the staggering as the main point, but I see how the
alignment itself helps too.
Is there a use case for not always doing the alignment?
Other than people asking why their numbers suddenly got worse...
> > I'm already unclear about the existing purpose of next_abs_period, but
> > if it has any use at all shouldn't it be to avoid writing intermediate
> > values like this back to tlat?
> >
>
> next_abs_period is basically just the ktime_t variant of
> tlat->abs_period for local calculations of the next period inside
> wait_next_period(). Its only purpose is the ktime_compare() call that
> increments tlat->abs_period by the period until it lands into the
> future, if it happens to be in the past. This is necessary to do for
> both a regular cycle (which might take long due to noise) and the
> first cycle with alignment (because the other thread's first wake up
> might be late), so it has to be set in the new code as well,
> otherwise, the while loop won't see the time is in the past.
Oh, I missed the unit difference (though it's basically just a typedef
at this point).
> I agree that this part of the code is confusing. There is also a
> field, timerlat_variables.rel_period (tlat->rel_period), that is not
> used anywhere, since the relative period is pulled out of
> osnoise_variables. Something like this would be easier to read and
> comprehend, IMHO:
Yeah, I noticed that as well... we should remove it if we're not going
to use it.
> /*
> * wait_next_period - Wait for the next period for timerlat
> */
> static int wait_next_period(struct timerlat_variables *tlat)
> {
> ktime_t now;
> u64 rel_period = osnoise_data.timerlat_period * 1000;
>
> now = hrtimer_cb_get_time(&tlat->timer);
>
> /*
> * Set the next abs_period.
> */
> tlat->abs_period += rel_period;
>
> /*
> * If the new abs_period is in the past, skip the activation.
> */
> while (ktime_compare(now, ns_to_ktime(tlat->abs_period) > 0) {
> next_abs_period = ns_to_ktime(tlat->abs_period + rel_period);
> tlat->abs_period = (u64) ktime_to_ns(next_abs_period);
> }
>
> set_current_state(TASK_INTERRUPTIBLE);
>
> hrtimer_start(&tlat->timer, next_abs_period,
> HRTIMER_MODE_ABS_PINNED_HARD);
> schedule();
> return 1;
> }
>
> (Excluding the changes from this patch.) What do you think?
Why can't we just make tlat->abs_period and every other time variable
in this file be ktime_t? Other than atomic stuff if we do go that
route.
Not saying that that should hold up this patch, just an idea to simplify
things.
-Crystal