[Bug binutils/32238] Performance issues found from binutils version '2.36' version while usage of function 'lang_output_section_statement_lookup'

mmalcomson at nvidia dot com Thu, 02 Jan 2025 07:42:17 -0800

https://sourceware.org/bugzilla/show_bug.cgi?id=32238


Matthew Malcomson <mmalcomson at nvidia dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mmalcomson at nvidia dot com

--- Comment #14 from Matthew Malcomson <mmalcomson at nvidia dot com> ---
I had a look into this and have a few approaches to speed things up.

As mentioned above the reason for this slowdown is that in commit
21401fc7bf67dbf73f (which made multiple output statements in a linker script
generate multiple output sections in the final link) removed an optimisation on
SPECIAL sections (that was added in commit 8a99a385a725).

That optimisation avoided a particular pathological case with bad performance
where a large number of SHT_GROUP sections with the same name create a very
large linked list in the section table.

I guess the optimisation was removed because it would have otherwise had to be
re-worked when the tristate semantics of `create` was added to
`lang_output_section_statement_lookup`.

The three patches I'm attaching after this comment are three different
approaches to fix this performance problem.

The approaches are:
1) Cache the start and end of the last linked list we traversed.  When we know
   for certain we want to create this section and this is the same linked list
   as we traversed last time, jump to the end of the linked list and append
   there.

2) Re-introduce a very similar optimisation to the one removed.  Since lookup
   only cares about returning the first added section, the order of all the
   remaining sections doesn't matter.  Hence we can add
   second/third/... sections with the same name to *just after* the first
   matching section.

3) This problem seems to be happening for SHT_GROUP sections.  These are all
   given the same name of ".group".  I don't believe this name is functionally
   important.  We could adjust the assembler to emit names with incrementing
   counters (".group1", ".group2", ...).  This means the length of this linked
   list would grow by order of number of linked objects rather than number of
   group sections.  The number of group sections can be very high when
compiling
   with `-ffunction-sections`.

I'd appreciate feedback on which of these seems like the best fix.
I suspect option (1) would be best.

I'd also appreciate any suggestions about how to add performance tests.  I
didn't see any existing performance tests in the testsuite.  Given that output
binaries are supposed to be identical before and after this change there wasn't
any obvious testsuite change I could add.

The best approach I thought of was a `-am-testing-XXX` flag only available in
debug builds, that prints out some information about how the linked list for
sections named ".group" is, and checking that output -- that seemed like a bit
too much framework.

N.b. a third somewhat unimportant point (including for completeness) is that
the
open-source testcase on LLVM also had a good about of slowdown due to the same
reason as seen in https://sourceware.org/bugzilla/show_bug.cgi?id=29259 .

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug binutils/32238] Performance issues found from binutils version '2.36' version while usage of function 'lang_output_section_statement_lookup'

Reply via email to