On Thu, 18 Apr 2019 at 14:08, Richard Sandiford <richard.sandif...@arm.com> wrote: > > "Richard Earnshaw (lists)" <richard.earns...@arm.com> writes: > > On 18/04/2019 03:38, Martin Sebor wrote: > >> The fix for pr89797 committed in r270326 was limited to targets > >> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64. > >> The tests for the fix have been failing with an ICE on aarch64 > >> because it suffers from more or less the same problem but in > >> its own target-specific code. Attached is the patch I posted > >> yesterday that fixes the ICE, successfully bootstrapped and > >> regtested on x86_64-linux. I also ran the dg.exp=*attr* and > >> aarch64.exp tests with an aarch64-linux-elf cross-compiler. > >> There are no ICEs but there are tons of errors in the latter > >> tests because many (most?) either expect to be able to find > >> libc headers or link executables (I have not built libc for > >> aarch64). > >> > >> I'm around tomorrow but then traveling the next two weeks (with > >> no connectivity the first week) so I unfortunately won't be able > >> to fix whatever this change might break until the week of May 6. > >> > >> Jeff, if you have an aarch64 tester that could verify this patch > >> tomorrow that would help give us some confidence. Otherwise, > >> another option to consider for the time being is to xfail > >> the tests on aarch64. > >> > >> Thanks > >> Martin > >> > >> gcc-89797.diff > >> > >> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable > >> > >> gcc/ChangeLog: > >> > >> PR middle-end/89797 > >> * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when > >> NUM_POLY_INT_COEFFS == 2. Use HOST_WIDE_INT_1U. > >> * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid > >> assuming type size fits in SHWI. > >> > >> Index: gcc/tree.h > >> =================================================================== > >> --- gcc/tree.h (revision 270418) > >> +++ gcc/tree.h (working copy) > >> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node) > >> if (NUM_POLY_INT_COEFFS == 2) > >> { > >> poly_uint64 res = 0; > >> - res.coeffs[0] = 1 << (precision & 0xff); > >> + res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff); > > > > I'm not sure I understand this either before or after. (precision & > > 0xff) gives a value in the range 0-255, which is way more than can be > > accommodated in a left shift, even for a HWI. > > For the record: the problem is that we have two coefficients that are > logically both in the range 1<<[0,63], needing 6 bits per coefficient > and 12 in total. But precision is only 10 bits. :-) Fortunately, > coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode > it in a single bit. > > A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split. > E.g. 0xff can be loaded directly from memory as a byte. There's not > going to be much in it either way though. > > >> if (precision & 0x100) > >> - res.coeffs[1] = 1 << (precision & 0xff); > >> + res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16); > > > > And this equally doesn't make sense >>16 will shift the masked value out > > of the bottom of the result. > > Yeah, we should keep the original shift amount and just do > s/1/HOST_WIDE_INT_1U/. > > > Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0? > > > > R. > > > >> return res; > >> } > >> else > >> - return (unsigned HOST_WIDE_INT)1 << precision; > >> + return HOST_WIDE_INT_1U << precision; > >> } > >> > >> /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must > >> Index: gcc/config/aarch64/aarch64.c > >> =================================================================== > >> --- gcc/config/aarch64/aarch64.c (revision 270418) > >> +++ gcc/config/aarch64/aarch64.c (working copy) > >> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type) > >> be set for non-predicate vectors of booleans. Modes are the most > >> direct way we have of identifying real SVE predicate types. */ > >> return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : > >> 128; > >> - HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type)); > >> + tree size = TYPE_SIZE (type); > >> + unsigned HOST_WIDE_INT align = 128; > >> + if (tree_fits_uhwi_p (size)) > >> + align = tree_to_uhwi (TYPE_SIZE (type)); > >> return MIN (align, 128); > > I guess this is personal preference, sorry, but: > > return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi (); > > seems more direct and obvious than: > > tree size = TYPE_SIZE (type); > unsigned HOST_WIDE_INT align = 128; > if (tree_fits_uhwi_p (size)) > align = tree_to_uhwi (TYPE_SIZE (type)); > return MIN (align, 128); > > OK with those changes, thanks. I've bootrapped & regression-tested > it in that form on aarch64-linux-gnu. >
I'm late in the party, and I confirm that Martin's original patch introduced regressions: Executed from: gcc.target/aarch64/aarch64.exp gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times \\.cfi_def_cfa_register 11 1 gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times \\.cfi_escape 0xf,0xc,0x8f,0,0x92,0x2e,0,.* 1 gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1 gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times cmp\\s+x[0-9]+, 61440 1 gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times str\\s+xzr, \\[sp, 0\\] 1 gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times sub\\s+x[0-9]+, x[0-9]+, 61440 1 and many in gcc.target/aarch64/sve/aarch64-sve.exp Christophe > Richard