On Fri, May 06, 2022 at 09:33:41AM -0700, Stephen Hemminger wrote: > On Fri, 6 May 2022 16:28:41 +0100 > Bruce Richardson <bruce.richard...@intel.com> wrote: > > > On Fri, May 06, 2022 at 03:12:32PM +0000, Honnappa Nagarahalli wrote: > > > <snip> > > > > > > > > On Thu, May 05, 2022 at 10:59:32PM +0000, Honnappa Nagarahalli wrote: > > > > > Thanks Stephen. Do you see any performance difference with this > > > > > change? > > > > > > > > as a matter of due diligence i think a comparison should be made just > > > > to be > > > > confident nothing is regressing. > > > > > > > > i support this change in principal since it is generally accepted best > > > > practice to > > > > not force inlining since it can remove more valuable optimizations that > > > > the > > > > compiler may make that the human can't see. > > > > the optimizations may vary depending on compiler implementation. > > > > > > > > force inlining should be used as a targeted measure rather than blanket > > > > on > > > > every function and when in use probably needs to be periodically > > > > reviewed and > > > > potentially removed as the code / compiler evolves. > > > > > > > > also one other consideration is the impact of a particular compiler's > > > > force > > > > inlining intrinsic/builtin is that it may permit inlining of functions > > > > when not > > > > declared in a header. i.e. a function from one library may be able to > > > > be inlined > > > > to another binary as a link time optimization. although everything here > > > > is in a > > > > header so it's a bit moot. > > > > > > > > i'd like to see this change go in if possible. > > > Like Stephen mentions below, I am sure we will have a for and against > > > discussion here. > > > As a DPDK community we have put performance front and center, I would > > > prefer to go down that route first. > > > > > > > I ran some initial numbers with this patch, and the very quick summary of > > what I've seen so far: > > > > * Unit tests show no major differences, and while it depends on what > > specific number you are interested in, most seem within margin of error. > > * Within unit tests, the one number I mostly look at when considering > > inlining is the "empty poll" cost, since I believe we should look to keep > > that as close to zero as possible. In the past I've seen that number jump > > from 3 cycles to 12 cycles due to missed inlining. In this case, it seem > > fine. > > * Ran a quick test with the eventdev_pipeline example app using SW eventdev, > > as a test of an actual app which is fairly ring-heavy [used 8 workers > > with 1000 cycles per packet hop]. (Thanks to Harry vH for this suggestion > > of a workload) > > * GCC 8 build - no difference observed > > * GCC 11 build - approx 2% perf reduction observed > > > > As I said, these are just some quick rough numbers, and I'll try and get > > some more numbers on a couple of different platforms, see if the small > > reduction seen is consistent or not. I may also test a few differnet > > combinations/options in the eventdev test. It would be good if others also > > tested on a few platforms available to them. > > > > /Bruce > > I wonder if a mixed approach might help where some key bits were marked > as more important to inline? Or setting compiler flags in build infra?
Yep, could be a number of options.