Hi! On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote: > Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to > avoid multiple csum_partial() with GCC10") inlined csum_partial(). > > Now that csum_partial() is inlined, GCC outlines csum_add() when > called by csum_partial().
> c064fb28 <csum_add>: > c064fb28: 7c 63 20 14 addc r3,r3,r4 > c064fb2c: 7c 63 01 94 addze r3,r3 > c064fb30: 4e 80 00 20 blr Could you build this with -fdump-tree-einline-all and send me the results? Or open a GCC PR yourself :-) Something seems to have decided this asm is more expensive than it is. That isn't always avoidable -- the compiler cannot look inside asms -- but it seems it could be improved here. Do you have (or can make) a self-contained testcase? > The sum with 0 is useless, should have been skipped. That isn't something the compiler can do anything about (not sure if you were suggesting that); it has to be done in the user code (and it tries to already, see below). > And there is even one completely unused instance of csum_add(). That is strange, that should never happen. > ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv': > ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call > to 'csum_add': call is unlikely and code size would grow [-Winline] > 94 | static inline __wsum csum_add(__wsum csum, __wsum addend) > | ^~~~~~~~ > ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here > 172 | sum = csum_add(sum, (__force __wsum)*(const > u32 *)buff); > | > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At least we say what happened. Progress! :-) > In the non-inlined version, the first sum with 0 was performed. > Here it is skipped. That is because of how __builtin_constant_p works, most likely. As we discussed elsewhere it is evaluated before all forms of loop unrolling. The patch looks perfect of course :-) Reviewed-by: Segher Boessenkool <seg...@kernel.crashing.org> Segher > --- a/arch/powerpc/include/asm/checksum.h > +++ b/arch/powerpc/include/asm/checksum.h > @@ -91,7 +91,7 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, > __be32 daddr, __u32 len, > } > > #define HAVE_ARCH_CSUM_ADD > -static inline __wsum csum_add(__wsum csum, __wsum addend) > +static __always_inline __wsum csum_add(__wsum csum, __wsum addend) > { > #ifdef __powerpc64__ > u64 res = (__force u64)csum;