On 2024-03-29 14:42, Morten Brørup wrote:
+CC techboard
From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
Sent: Friday, 29 March 2024 14.05
Hi Stephen,
On 3/29/24 03:53, Stephen Hemminger wrote:
On Thu, 28 Mar 2024 17:10:42 -0700
Andrey Ignatov <r...@apple.com> wrote:
You don't need always inline, the compiler will do it anyway.
I can remove it in v2, but it's not completely obvious to me how is
it
decided when to specify it explicitly and when not?
I see plenty of __rte_always_inline in this file:
% git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
lib/vhost/virtio_net.c:66
Cargo cult really.
Cargo cult... really?
Well, I just did a quick test by comparing IO forwarding with testpmd
between main branch and with adding a patch that removes all the
inline/noinline in lib/vhost/virtio_net.c [0].
main branch: 14.63Mpps
main branch - inline/noinline: 10.24Mpps
Thank you for testing this, Maxime. Very interesting!
It is sometimes suggested on techboard meetings that we should convert more
inline functions to non-inline for improved API/ABI stability, with the
argument that the performance of inlining is negligible.
I think you are mixing two different (but related) things here.
1) marking functions with the inline family of keywords/attributes
2) keeping function definitions in header files
1) does not affect the ABI, while 2) does. Neither 1) nor 2) affects the
API (i.e., source-level compatibility).
2) *allows* for function inlining even in non-LTO builds, but doesn't
force it.
If you don't believe 2) makes a difference performance-wise, it follows
that you also don't believe LTO makes much of a difference. Both have
the same effect: allowing the compiler to reason over a larger chunk of
your program.
Allowing the compiler to inline small, often-called functions is crucial
for performance, in my experience. If the target symbol tend to be in a
shared object, the difference is even larger. It's also quite common
that you see no effect of LTO (other than a reduction of code footprint).
As LTO becomes more practical to use, 2) loses much of its appeal.
If PGO ever becomes practical to use, maybe 1) will as well.
I think this test proves that the sum of many small (negligible) performance
differences it not negligible!
Andrey, thanks for the patch, I'll have a look at it next week.
Maxime
[0]: https://pastebin.com/72P2npZ0