"Richard Earnshaw (lists)" <richard.earns...@arm.com> writes: > On 18/04/2019 03:38, Martin Sebor wrote: >> The fix for pr89797 committed in r270326 was limited to targets >> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64. >> The tests for the fix have been failing with an ICE on aarch64 >> because it suffers from more or less the same problem but in >> its own target-specific code. Attached is the patch I posted >> yesterday that fixes the ICE, successfully bootstrapped and >> regtested on x86_64-linux. I also ran the dg.exp=*attr* and >> aarch64.exp tests with an aarch64-linux-elf cross-compiler. >> There are no ICEs but there are tons of errors in the latter >> tests because many (most?) either expect to be able to find >> libc headers or link executables (I have not built libc for >> aarch64). >> >> I'm around tomorrow but then traveling the next two weeks (with >> no connectivity the first week) so I unfortunately won't be able >> to fix whatever this change might break until the week of May 6. >> >> Jeff, if you have an aarch64 tester that could verify this patch >> tomorrow that would help give us some confidence. Otherwise, >> another option to consider for the time being is to xfail >> the tests on aarch64. >> >> Thanks >> Martin >> >> gcc-89797.diff >> >> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable >> >> gcc/ChangeLog: >> >> PR middle-end/89797 >> * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when >> NUM_POLY_INT_COEFFS == 2. Use HOST_WIDE_INT_1U. >> * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid >> assuming type size fits in SHWI. >> >> Index: gcc/tree.h >> =================================================================== >> --- gcc/tree.h (revision 270418) >> +++ gcc/tree.h (working copy) >> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node) >> if (NUM_POLY_INT_COEFFS == 2) >> { >> poly_uint64 res = 0; >> - res.coeffs[0] = 1 << (precision & 0xff); >> + res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff); > > I'm not sure I understand this either before or after. (precision & > 0xff) gives a value in the range 0-255, which is way more than can be > accommodated in a left shift, even for a HWI.
For the record: the problem is that we have two coefficients that are logically both in the range 1<<[0,63], needing 6 bits per coefficient and 12 in total. But precision is only 10 bits. :-) Fortunately, coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode it in a single bit. A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split. E.g. 0xff can be loaded directly from memory as a byte. There's not going to be much in it either way though. >> if (precision & 0x100) >> - res.coeffs[1] = 1 << (precision & 0xff); >> + res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16); > > And this equally doesn't make sense >>16 will shift the masked value out > of the bottom of the result. Yeah, we should keep the original shift amount and just do s/1/HOST_WIDE_INT_1U/. > Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0? > > R. > >> return res; >> } >> else >> - return (unsigned HOST_WIDE_INT)1 << precision; >> + return HOST_WIDE_INT_1U << precision; >> } >> >> /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must >> Index: gcc/config/aarch64/aarch64.c >> =================================================================== >> --- gcc/config/aarch64/aarch64.c (revision 270418) >> +++ gcc/config/aarch64/aarch64.c (working copy) >> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type) >> be set for non-predicate vectors of booleans. Modes are the most >> direct way we have of identifying real SVE predicate types. */ >> return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128; >> - HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type)); >> + tree size = TYPE_SIZE (type); >> + unsigned HOST_WIDE_INT align = 128; >> + if (tree_fits_uhwi_p (size)) >> + align = tree_to_uhwi (TYPE_SIZE (type)); >> return MIN (align, 128); I guess this is personal preference, sorry, but: return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi (); seems more direct and obvious than: tree size = TYPE_SIZE (type); unsigned HOST_WIDE_INT align = 128; if (tree_fits_uhwi_p (size)) align = tree_to_uhwi (TYPE_SIZE (type)); return MIN (align, 128); OK with those changes, thanks. I've bootrapped & regression-tested it in that form on aarch64-linux-gnu. Richard