ioapic: Drop function pointers from __ioapic_{read,write}_entry()

Andrew Cooper Wed, 17 Nov 2021 16:32:27 -0800

On 12/11/2021 10:43, Jan Beulich wrote:

On 11.11.2021 18:57, Andrew Cooper wrote:

Function pointers are expensive, and the raw parameter is a constant from all
callers, meaning that it predicts very well with local branch history.

The code change is fine, but I'm having trouble with "all" here: Both
functions aren't even static, so while callers in io_apic.c may
benefit (perhaps with the exception of ioapic_{read,write}_entry(),
depending on whether the compiler views inlining them as warranted),
I'm in no way convinced this extends to the callers in VT-d code.


Further ISTR clang being quite a bit less aggressive about inlining,
so the effects might not be quite as good there even for the call
sites in io_apic.c.

Can you clarify this for me please?

The way the compiler lays out the code is unrelated to why this form isan improvement.

Branch history is a function of "the $N most recently taken branches". This is because "how you got here" is typically relevant to "where youshould go next".

Trivial schemes maintain a shift register of taken / not-taken results. Less trivial schemes maintain a rolling hash of (src addr, dst addr)tuples of all taken branches (direct and indirect). In both cases, theinstantaneous branch history is an input into the final prediction, andis commonly used to select which saturating counter (or bank ofcounters) is used.


Consider something like

while ( cond )
{
    memcpy(dst1, src1, 64);
    memcpy(dst2, src2, 7);
}

Here, the conditional jump inside memcpy() coping with the tail of thecopy flips result 50% of the time, which is fiendish to predict for.

However, because the branch history differs (by memcpy()'s returnaddress which was accumulated by the call instruction), the predictorcan actually use two different taken/not-taken counters for the twodifferent "instances" if the tail jump. After a few iterations to warmup, the predictor will get every jump perfect despite the fact thatmemcpy() is a library call and the branches would otherwise alias.

Bringing it back to the code in question. The "raw" parameter is anexplicit true or false at the top of all call paths leading into thesefunctions. Therefore, an individual branch history has a highcorrelation with said true or false, irrespective of the absolute codelayout. As a consequence, the correct result of the prediction ishighly correlated with the branch history, and it will predictperfectly[1] after a few times the path has been used.


~Andrew

[1] Obviously, it's not actually perfect outside of a syntheticexample. Aliasing in the predictor is a necessary property of keepingthe logic small enough to provide an answer fast, but the lessaccidental aliasing there is, the faster the CPU performance inbenchmarks, so incentives are in our favour here.

Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()

Reply via email to