[Bug target/77308] surprisingly large stack usage for sha512 on arm

wdijkstr at arm dot com Tue, 25 Oct 2016 11:42:07 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #14 from Wilco <wdijkstr at arm dot com> ---
(In reply to Bernd Edlinger from comment #13)
> I am still trying to understand why thumb1 seems to outperform thumb2.
> 
> Obviously thumb1 does not have the shiftdi3 pattern,
> but even if I remove these from thumb2, the result is still
> not par with thumb2.  Apparently other patterns still produce di
> values that are not enabled with thumb1, they are 
> xordi3 and anddi3, these are often used.  Then there is
> adddi3 that is enabled in thumb1 and thumb2, I also disabled
> this one, and now the sha512 gets down to inclredible 1152
> bytes frame (-Os -march=armv7 -mthumb -float-abi=soft):
> 
> I know this is a hack, but 1K stack is what we should expect...
> 
> --- arm.md      2016-10-25 19:54:16.425736721 +0200
> +++ arm.md.orig 2016-10-17 19:46:59.000000000 +0200
> @@ -448,7 +448,7 @@
>           (plus:DI (match_operand:DI 1 "s_register_operand" "")
>                    (match_operand:DI 2 "arm_adddi_operand"  "")))
>      (clobber (reg:CC CC_REGNUM))])]
> -  "TARGET_EITHER && !TARGET_THUMB2"
> +  "TARGET_EITHER"

So you're actually turning the these instructions off for Thumb-2? What does it
do instead then? Do the number of instructions go down?

I noticed that with or without -mfpu=neon, using -marm is significantly smaller
than -mthumb. Most of the extra instructions appear to be moves, which means
something is wrong (I would expect Thumb-2 to do better as it supports LDRD
with larger offsets than ARM).

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to