https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #14 from Wilco <wdijkstr at arm dot com> --- (In reply to Bernd Edlinger from comment #13) > I am still trying to understand why thumb1 seems to outperform thumb2. > > Obviously thumb1 does not have the shiftdi3 pattern, > but even if I remove these from thumb2, the result is still > not par with thumb2. Apparently other patterns still produce di > values that are not enabled with thumb1, they are > xordi3 and anddi3, these are often used. Then there is > adddi3 that is enabled in thumb1 and thumb2, I also disabled > this one, and now the sha512 gets down to inclredible 1152 > bytes frame (-Os -march=armv7 -mthumb -float-abi=soft): > > I know this is a hack, but 1K stack is what we should expect... > > --- arm.md 2016-10-25 19:54:16.425736721 +0200 > +++ arm.md.orig 2016-10-17 19:46:59.000000000 +0200 > @@ -448,7 +448,7 @@ > (plus:DI (match_operand:DI 1 "s_register_operand" "") > (match_operand:DI 2 "arm_adddi_operand" ""))) > (clobber (reg:CC CC_REGNUM))])] > - "TARGET_EITHER && !TARGET_THUMB2" > + "TARGET_EITHER" So you're actually turning the these instructions off for Thumb-2? What does it do instead then? Do the number of instructions go down? I noticed that with or without -mfpu=neon, using -marm is significantly smaller than -mthumb. Most of the extra instructions appear to be moves, which means something is wrong (I would expect Thumb-2 to do better as it supports LDRD with larger offsets than ARM).