Hi,
Le 11/05/2021 à 12:51, Segher Boessenkool a écrit :
Hi!
On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote:
Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to
avoid multiple csum_partial() with GCC10") inlined csum_partial().
Now that csum_partial() is inlined, GCC outlines csum_add() when
called by csum_partial().
c064fb28 <csum_add>:
c064fb28: 7c 63 20 14 addc r3,r3,r4
c064fb2c: 7c 63 01 94 addze r3,r3
c064fb30: 4e 80 00 20 blr
Could you build this with -fdump-tree-einline-all and send me the
results? Or open a GCC PR yourself :-)
Ok, I'll forward it to you in a minute.
Something seems to have decided this asm is more expensive than it is.
That isn't always avoidable -- the compiler cannot look inside asms --
but it seems it could be improved here.
Do you have (or can make) a self-contained testcase?
I have not tried, and I fear it might be difficult, because on a kernel build with dozens of calls
to csum_add(), only ip6_tunnel.o exhibits such an issue.
The sum with 0 is useless, should have been skipped.
That isn't something the compiler can do anything about (not sure if you
were suggesting that); it has to be done in the user code (and it tries
to already, see below).
I was not suggesting that, only that when properly inlined the sum with 0 is skipped (because we put
the necessary stuff in csum_add() of course).
And there is even one completely unused instance of csum_add().
That is strange, that should never happen.
It seems that several .o include unused versions of csum_add. After the final link, one remains (in
addition to the used one) in vmlinux.
./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv':
./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call
to 'csum_add': call is unlikely and code size would grow [-Winline]
94 | static inline __wsum csum_add(__wsum csum, __wsum addend)
| ^~~~~~~~
./arch/powerpc/include/asm/checksum.h:172:31: note: called from here
172 | sum = csum_add(sum, (__force __wsum)*(const
u32 *)buff);
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At least we say what happened. Progress! :-)
Lol. I've seen this warning for long, that's not something new I guess.
In the non-inlined version, the first sum with 0 was performed.
Here it is skipped.
That is because of how __builtin_constant_p works, most likely. As we
discussed elsewhere it is evaluated before all forms of loop unrolling.
But we are not talking about loop unrolling here, are we ?
It seems that the reason here is that __builtin_constant_p() is evaluated long after GCC decided to
not inline that call to csum_add().
Christophe