[Bug target/77308] surprisingly large stack usage for sha512 on arm

wilco at gcc dot gnu.org Wed, 26 Oct 2016 05:06:07 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


wilco at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #17 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #16)
> Wow.

> 
> Maybe I am dreaming, or something is completely wrong now...

Well I can reproduce it so it is real - thanks for the insight! Basically
you're forcing Thumb-2 to split early like Thumb-1. This allows the top and
bottom halves to be independently optimized (unused halves or zeroes are very
common in DI mode operations), resulting in much lower register pressure.

Interestingly the subreg removal phase works fine if you split *all* DI mode
operations early. This explains the bad code for shifts as they are split early
but can't have their subregs removed due to other DI mode operations being
split after reload.

This means the correct approach is to split all operations before register
allocation. Looking at the phase list, Expand appears best as there isn't any
constant propagation or dead code elimination done after Split1. Interestingly
it will be a major simplification as we can get rid of a huge number of complex
patterns and do exactly the same for ARM, Thumb-1, Thumb-2, VFP and NEON.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to