On 2024-03-29 14:42, Morten Brørup wrote:
+CC techboard

From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
Sent: Friday, 29 March 2024 14.05

Hi Stephen,

On 3/29/24 03:53, Stephen Hemminger wrote:
On Thu, 28 Mar 2024 17:10:42 -0700
Andrey Ignatov <r...@apple.com> wrote:


You don't need always inline, the compiler will do it anyway.

I can remove it in v2, but it's not completely obvious to me how is
it
decided when to specify it explicitly and when not?

I see plenty of __rte_always_inline in this file:

% git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
lib/vhost/virtio_net.c:66


Cargo cult really.


Cargo cult... really?

Well, I just did a quick test by comparing IO forwarding with testpmd
between main branch and with adding a patch that removes all the
inline/noinline in lib/vhost/virtio_net.c [0].

main branch: 14.63Mpps
main branch - inline/noinline: 10.24Mpps

Thank you for testing this, Maxime. Very interesting!

It is sometimes suggested on techboard meetings that we should convert more 
inline functions to non-inline for improved API/ABI stability, with the 
argument that the performance of inlining is negligible.


I think you are mixing two different (but related) things here.
1) marking functions with the inline family of keywords/attributes
2) keeping function definitions in header files

1) does not affect the ABI, while 2) does. Neither 1) nor 2) affects the API (i.e., source-level compatibility).

2) *allows* for function inlining even in non-LTO builds, but doesn't force it.

If you don't believe 2) makes a difference performance-wise, it follows that you also don't believe LTO makes much of a difference. Both have the same effect: allowing the compiler to reason over a larger chunk of your program.

Allowing the compiler to inline small, often-called functions is crucial for performance, in my experience. If the target symbol tend to be in a shared object, the difference is even larger. It's also quite common that you see no effect of LTO (other than a reduction of code footprint).

As LTO becomes more practical to use, 2) loses much of its appeal.

If PGO ever becomes practical to use, maybe 1) will as well.

I think this test proves that the sum of many small (negligible) performance 
differences it not negligible!


Andrey, thanks for the patch, I'll have a look at it next week.

Maxime

[0]: https://pastebin.com/72P2npZ0

Reply via email to