[Bug target/104271] [12 Regression] 538.imagick_r run-time at -Ofast -march=native regressed by 26% on Intel Cascade Lake server CPU

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 29 Mar 2022 03:39:10 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |jamborm at gcc dot gnu.org

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to cuilili from comment #7)
> Created attachment 52706 [details]
> Add a heuristic for eliminate redundant load and store in inline pass.
> 
> Hi Richard,
> 
> Could you help take a look? This is my first time adding code in mid-end,
> hope you can give me some advice, thank you!
> 
> I add a INLINE_HINT_eliminate_load_and_store hint in to inline pass. when
> callee's memory access is caller's local memory parameter and access size is
> greater than the target threshold, we will enable the hint. with the hint,
> inlining_insns_auto will enlarge the bound. The target hook is only enabled
> for x86 now.
> 
> With the patch applied
> Icelake server: 538.imagic_r get 15.18% improvement for multicopy and 40.78%
> improvement for single copy with no measurable changes for other benchmarks.
> 
> Casecadelake: 538.imagic_r get 12.4% improvement for multicopy with and code
> size increased by 0.4%. With no measurable changes for other benchmarks.
> 
> Znver3 server: 538.imagic_r get 9.6% improvement for multicopy with and code
> size increased by 0.5%. With no measurable changes for other benchmarks.

It's an interesting idea, note Honza knows better about IPA modref and
inlining than me.  What I doubt is that you can directly use IPA modref
info to determine whether inlining will likely elide a store/load pair
since IIRC the modref info is for the whole function.

IPA SRA might perform the kind of analysis that is contained to the
call context and that might be available here already (and eventually
even IPA SRA considers passing the stored/loaded values by value?)

But yes, having a stream of up to N (independent?) stores before each call
plus a stream of up to M (independent hoistable to function start?) loads
at each function start would make such analysis possible.

[Bug target/104271] [12 Regression] 538.imagick_r run-time at -Ofast -march=native regressed by 26% on Intel Cascade Lake server CPU

Reply via email to