On Mon, Feb 15, 2016 at 11:32 AM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > On 04/02/16 08:58, Ramana Radhakrishnan wrote: >> >> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wil...@linaro.org> wrote: >>> >>> This is my suggested fix for PR 65932, which is a linux kernel >>> miscompile with gcc-5.1. >>> >>> The problem here is caused by a chain of events. The first is that >>> the relatively new eipa_sra pass creates fake parameters that behave >>> slightly differently than normal parameters. The second is that the >>> optimizer creates phi nodes that copy local variables to fake >>> parameters and/or vice versa. The third is that the ouf-of-ssa pass >>> assumes that it can emit simple move instructions for these phi nodes. >>> And the fourth is that the ARM port has a PROMOTE_MODE macro that >>> forces QImode and HImode to unsigned, but a >>> TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and >>> short parameters have different in register representations than local >>> variables, and require a conversion when copying between them, a >>> conversion that the out-of-ssa pass can't easily emit. >>> >>> Ultimately, I think this is a problem in the arm backend. It should >>> not have a PROMOTE_MODE macro that is changing the sign of char and >>> short local variables. I also think that we should merge the >>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to >>> prevent this from happening again. >>> >>> I see four general problems with the current ARM PROMOTE_MODE definition. >>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb >>> instruction was added. It is a lose for armv6 and later. >>> 2) Unsigned short was only faster for targets that don't support >>> unaligned accesses. Support for these targets was removed a while >>> ago, and this PROMODE_MODE hunk should have been removed at the same >>> time. It was accidentally left behind. >>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was >>> converted to a function, the PROMOTE_MODE code was copied without the >>> UNSIGNEDP changes. Thus it is only an accident that >>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree. Changing >>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE >>> changes to resolve the difference are safe. >>> 4) There is a general principle that you should only change signedness >>> in PROMOTE_MODE if the hardware forces it, as otherwise this results >>> in extra conversion instructions that make code slower. The mips64 >>> hardware for instance requires that 32-bit values be sign-extended >>> regardless of type, and instructions may trap if this is not true. >>> However, it has a set of 32-bit instructions that operate on these >>> values, and hence no conversions are required. There is no similar >>> case on ARM. Thus the conversions are unnecessary and unwise. This >>> can be seen in the testcases where gcc emits both a zero-extend and a >>> sign-extend inside a loop, as the sign-extend is required for a >>> compare, and the zero-extend is required by PROMOTE_MODE. >> >> Given Kyrill's testing with the patch and the reasonably detailed >> check of the effects of code generation changes - The arm.h hunk is ok >> - I do think we should make this explicit in the documentation that >> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and >> better still maybe put in a checking assert for the same in the >> mid-end but that could be the subject of a follow-up patch. >> >> Ok to apply just the arm.h hunk as I think Kyrill has taken care of >> the testsuite fallout separately. > > Hi all, > > I'd like to backport the arm.h from this ( r233130) to the GCC 5 > branch. As the CSE patch from my series had some fallout on x86_64 > due to a deficiency in the AVX patterns that is too invasive to fix > at this stage (and presumably backport), I'd like to just backport > this arm.h fix and adjust the tests to XFAIL the fallout that comes > with not applying the CSE patch. The attached patch does that.
I would hope that someone looks to fix the AVX port for GCC 7 - as the CSE patch is something that is useful on ARM for performance in real terms and could have benefits elsewhere. OK to apply if no regressions - thanks for pushing this through. Thanks, Ramana > > The code quality fallout on code outside the testsuite is not > that gread. The SPEC benchmarks are not affected by not applying > the CSE change, and only a single sequence in a popular embedded benchmark > shows some degradation for -mtune=cortex-a9 in the same way as the > wmul-1.c and wmul-2.c tests. > > I think that's a fair tradeoff for fixing the wrong code bug on that branch. > > Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch? > > Thanks, > Kyrill > > 2016-02-15 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR target/65932 > * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options. > xfail the scan-assembler test. > * gcc.target/arm/wmul-2.c: Likewise. > * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb. > > > > >> >> regards >> Ramana >> >> >> >> >>> My change was tested with an arm bootstrap, make check, and SPEC >>> CPU2000 run. The original poster verified that this gives a linux >>> kernel that boots correctly. >>> >>> The PRMOTE_MODE change causes 3 testsuite testcases to fail. These >>> are tests to verify that smulbb and/or smlabb are generated. >>> Eliminating the unnecessary sign conversions causes us to get better >>> code that doesn't include the smulbb and smlabb instructions. I had >>> to modify the testcases to get them to emit the desired instructions. >>> With the testcase changes there are no additional testsuite failures, >>> though I'm concerned that these testcases with the changes may be >>> fragile, and future changes may break them again. >> >> >> >>> If there are ARM parts where smulbb/smlabb are faster than mul/mla, >>> then maybe we should try to add new patterns to get the instructions >>> emitted again for the unmodified testcases. >>> >>> Jim > >