On Wed, Apr 21, 2021 at 12:06:34PM +0200, Jan Beulich wrote:
> On 20.04.2021 18:12, Roger Pau Monné wrote:
> > On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
> >> Reading the platform timer isn't cheap, so we'd better avoid it when the
> >> resulting value is of no interest to anyone.
> >>
> >> The consumer of master_stime, obtained by
> >> time_calibration_{std,tsc}_rendezvous() and propagated through
> >> this_cpu(cpu_calibration), is local_time_calibration(). With
> >> CONSTANT_TSC the latter function uses an early exit path, which doesn't
> >> explicitly use the field. While this_cpu(cpu_calibration) (including the
> >> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
> >> path, both structures' fields get consumed only by the !CONSTANT_TSC
> >> logic of the function.
> >>
> >> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> >> ---
> >> v4: New.
> >> ---
> >> I realize there's some risk associated with potential new uses of the
> >> field down the road. What would people think about compiling time.c a
> >> 2nd time into a dummy object file, with a conditional enabled to force
> >> assuming CONSTANT_TSC, and with that conditional used to suppress
> >> presence of the field as well as all audited used of it (i.e. in
> >> particular that large part of local_time_calibration())? Unexpected new
> >> users of the field would then cause build time errors.
> > 
> > Wouldn't that add quite a lot of churn to the file itself in the form
> > of pre-processor conditionals?
> 
> Possibly - I didn't try yet, simply because of fearing this might
> not be liked even without presenting it in patch form.
> 
> > Could we instead set master_stime to an invalid value that would make
> > the consumers explode somehow?
> 
> No idea whether there is any such "reliable" value.
> 
> > I know there might be new consumers, but those should be able to
> > figure whether the value is sane by looking at the existing ones.
> 
> This could be the hope, yes. But the effort of auditing the code to
> confirm the potential of optimizing this (after vaguely getting the
> impression there might be room) was non-negligible (in fact I did
> three runs just to be really certain). This in particular means
> that I'm in no way certain that looking at existing consumers would
> point out the possible pitfall.
> 
> > Also, since this is only done on the BSP on the last iteration I
> > wonder if it really makes such a difference performance-wise to
> > warrant all this trouble.
> 
> By "all this trouble", do you mean the outlined further steps or
> the patch itself?

Yes, either the further steps or the fact that we would have to be
careful to not introduce new users of master_stime that expect it to
be set when CONSTANT_TSC is true.

> In the latter case, while it's only the BSP to
> read the value, all other CPUs are waiting for the BSP to get its
> part done. So the extra time it takes to read the platform clock
> affects the overall duration of the rendezvous, and hence the time
> not "usefully" spent by _all_ of the CPUs.

Right, but that's only during the time rendezvous, which doesn't
happen that often. And I guess that just the rendezvous of all CPUs is
biggest hit in terms of performance.

While I don't think I would have done the work myself, I guess there's
no reason to block it.

In any case I would prefer if such performance related changes come
with some proof that they do indeed make a difference, or else we
might just be making the code more complicated for no concrete
performance benefit.

Thanks, Roger.

Reply via email to