https://sourceware.org/bugzilla/show_bug.cgi?id=32238
Matthew Malcomson <mmalcomson at nvidia dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmalcomson at nvidia dot com --- Comment #14 from Matthew Malcomson <mmalcomson at nvidia dot com> --- I had a look into this and have a few approaches to speed things up. As mentioned above the reason for this slowdown is that in commit 21401fc7bf67dbf73f (which made multiple output statements in a linker script generate multiple output sections in the final link) removed an optimisation on SPECIAL sections (that was added in commit 8a99a385a725). That optimisation avoided a particular pathological case with bad performance where a large number of SHT_GROUP sections with the same name create a very large linked list in the section table. I guess the optimisation was removed because it would have otherwise had to be re-worked when the tristate semantics of `create` was added to `lang_output_section_statement_lookup`. The three patches I'm attaching after this comment are three different approaches to fix this performance problem. The approaches are: 1) Cache the start and end of the last linked list we traversed. When we know for certain we want to create this section and this is the same linked list as we traversed last time, jump to the end of the linked list and append there. 2) Re-introduce a very similar optimisation to the one removed. Since lookup only cares about returning the first added section, the order of all the remaining sections doesn't matter. Hence we can add second/third/... sections with the same name to *just after* the first matching section. 3) This problem seems to be happening for SHT_GROUP sections. These are all given the same name of ".group". I don't believe this name is functionally important. We could adjust the assembler to emit names with incrementing counters (".group1", ".group2", ...). This means the length of this linked list would grow by order of number of linked objects rather than number of group sections. The number of group sections can be very high when compiling with `-ffunction-sections`. I'd appreciate feedback on which of these seems like the best fix. I suspect option (1) would be best. I'd also appreciate any suggestions about how to add performance tests. I didn't see any existing performance tests in the testsuite. Given that output binaries are supposed to be identical before and after this change there wasn't any obvious testsuite change I could add. The best approach I thought of was a `-am-testing-XXX` flag only available in debug builds, that prints out some information about how the linked list for sections named ".group" is, and checking that output -- that seemed like a bit too much framework. N.b. a third somewhat unimportant point (including for completeness) is that the open-source testcase on LLVM also had a good about of slowdown due to the same reason as seen in https://sourceware.org/bugzilla/show_bug.cgi?id=29259 . -- You are receiving this mail because: You are on the CC list for the bug.