On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wil...@linaro.org> wrote: > This is my suggested fix for PR 65932, which is a linux kernel > miscompile with gcc-5.1. > > The problem here is caused by a chain of events. The first is that > the relatively new eipa_sra pass creates fake parameters that behave > slightly differently than normal parameters. The second is that the > optimizer creates phi nodes that copy local variables to fake > parameters and/or vice versa. The third is that the ouf-of-ssa pass > assumes that it can emit simple move instructions for these phi nodes. > And the fourth is that the ARM port has a PROMOTE_MODE macro that > forces QImode and HImode to unsigned, but a > TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and > short parameters have different in register representations than local > variables, and require a conversion when copying between them, a > conversion that the out-of-ssa pass can't easily emit. > > Ultimately, I think this is a problem in the arm backend. It should > not have a PROMOTE_MODE macro that is changing the sign of char and > short local variables. I also think that we should merge the > PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to > prevent this from happening again. > > I see four general problems with the current ARM PROMOTE_MODE definition. > 1) Unsigned char is only faster for armv5 and earlier, before the sxtb > instruction was added. It is a lose for armv6 and later. > 2) Unsigned short was only faster for targets that don't support > unaligned accesses. Support for these targets was removed a while > ago, and this PROMODE_MODE hunk should have been removed at the same > time. It was accidentally left behind. > 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was > converted to a function, the PROMOTE_MODE code was copied without the > UNSIGNEDP changes. Thus it is only an accident that > TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree. Changing > TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE > changes to resolve the difference are safe. > 4) There is a general principle that you should only change signedness > in PROMOTE_MODE if the hardware forces it, as otherwise this results > in extra conversion instructions that make code slower. The mips64 > hardware for instance requires that 32-bit values be sign-extended > regardless of type, and instructions may trap if this is not true. > However, it has a set of 32-bit instructions that operate on these > values, and hence no conversions are required. There is no similar > case on ARM. Thus the conversions are unnecessary and unwise. This > can be seen in the testcases where gcc emits both a zero-extend and a > sign-extend inside a loop, as the sign-extend is required for a > compare, and the zero-extend is required by PROMOTE_MODE.
Given Kyrill's testing with the patch and the reasonably detailed check of the effects of code generation changes - The arm.h hunk is ok - I do think we should make this explicit in the documentation that TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and better still maybe put in a checking assert for the same in the mid-end but that could be the subject of a follow-up patch. Ok to apply just the arm.h hunk as I think Kyrill has taken care of the testsuite fallout separately. regards Ramana > > My change was tested with an arm bootstrap, make check, and SPEC > CPU2000 run. The original poster verified that this gives a linux > kernel that boots correctly. > > The PRMOTE_MODE change causes 3 testsuite testcases to fail. These > are tests to verify that smulbb and/or smlabb are generated. > Eliminating the unnecessary sign conversions causes us to get better > code that doesn't include the smulbb and smlabb instructions. I had > to modify the testcases to get them to emit the desired instructions. > With the testcase changes there are no additional testsuite failures, > though I'm concerned that these testcases with the changes may be > fragile, and future changes may break them again. > > If there are ARM parts where smulbb/smlabb are faster than mul/mla, > then maybe we should try to add new patterns to get the instructions > emitted again for the unmodified testcases. > > Jim