[ was: Re: [committed][testsuite] Re-enable pr94600-{1,3}.c tests for arm ]
On 10/1/20 7:38 AM, Hans-Peter Nilsson wrote: > On Wed, 30 Sep 2020, Tom de Vries wrote: > >> [ was: Re: [committed][testsuite] Require non_strict_align in >> pr94600-{1,3}.c ] >> >> On 9/30/20 4:53 AM, Hans-Peter Nilsson wrote: >>> On Thu, 24 Sep 2020, Tom de Vries wrote: >>> >>>> Hi, >>>> >>>> With the nvptx target, we run into: >>>> ... >>>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6 >>>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6 >>>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1 >>>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1 >>>> ... >>>> The scans attempt to check for volatile stores, but on nvptx we have memcpy >>>> instead. >>>> >>>> This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect >>>> that the TYPE_MODE for the store target is set to BKLmode in >>>> compute_record_mode. >>>> >>>> Fix the FAILs by requiring effective target non_strict_align. >>> >>> No, that's wrong. There's more than that at play; it worked for >>> the strict-alignment targets where it was tested at the time. >>> >> >> Hi, >> >> thanks for letting me know. >> >>> The test is a valuable canary for this kind of bug. You now >>> disabled it for strict-alignment targets. >>> >>> Please revert and add your target specifier instead, if you >>> don't feel like investigating further. >> >> I've analyzed the compilation on strict-alignment target arm-eabi, and > > An analysis should result in more than that statement. > Well, it refers to the analysis in the commit log of the patch, sorry if that was not obvious. >> broadened the effective target to (non_strict_align || >> pcc_bitfield_type_matters). > > That's *also* not right. I'm guessing your nvptx fails because > it has 64-bit alignment requirement, but no 32-bit writes. > ...um, no that can't be it, nvptx seems to have them. Costs? > Yes, probably your #define MOVE_RATIO(SPEED) 4. > > The writes are to 32-bit aligned addresses which gcc can deduce > (also for strict-alignment targets) because it's a literal, > where it isn't explicitly declared to be attribute-aligned > > You should have noticed the weirness in that you "only" needed > to tweak pr94600-1.c and -3.c, not even pr94600-2.c, which > should be the case if it was just the test-case getting the > predicates wrong. This points at your MOVE_RATIO, together with > middle-end not applying it consistently for -2.c. > > Again, please just skip for nvptx (don't mix-n-match general > predicates) unless you really look into the reason you don't get > 6 single 32-bit-writes only in *some* of the cases. Thanks for the pointer to pr94600-2.c. I've compared the behaviour between pr94600-1.c and pr94600-2.c and figured out why in one case we get the load/store pair, and in the other the memcpy. See rationale in commit below. Committed to trunk. Thanks, - Tom
[testsuite] Enable pr94600-{1,3}.c tests for nvptx When compiling test-case pr94600-1.c for nvptx, this gimple mem move: ... MEM[(volatile struct t0 *)655404B] ={v} a0[0]; ... is expanded into a memcpy, but when compiling pr94600-2.c instead, this similar gimple mem move: ... MEM[(volatile struct t0 *)655404B] ={v} a00; ... is expanded into a 32-bit load/store pair. In both cases, emit_block_move is called. In the latter case, can_move_by_pieces (4 /* byte-size */, 32 /* bit-align */) is called, which returns true (because by_pieces_ninsns returns 1, which is smaller than the MOVE_RATIO of 4). In the former case, can_move_by_pieces (4 /* byte-size */, 8 /* bit-align */) is called, which returns false (because by_pieces_ninsns returns 4, which is not smaller than the MOVE_RATIO of 4). So the difference in code generation is explained by the alignment. The difference in alignment comes from the move sources: a0[0] vs. a00. Both have the same type with 8-bit alignment, but a00 is on stack, which based on the base stack align and stack variable placement happens to result in a 32-bit alignment. Enable test-cases pr94600-{1,3}.c for nvptx by forcing the currently 8-byte aligned variables to have a 32-bit alignment for STRICT_ALIGNMENT targets. Tested on nvptx. gcc/testsuite/ChangeLog: 2020-10-01 Tom de Vries <tdevr...@suse.de> * gcc.dg/pr94600-1.c: Force 32-bit alignment for a0 for !non_strict_align targets. Remove target clauses from scan tests. * gcc.dg/pr94600-3.c: Same. --- gcc/testsuite/gcc.dg/pr94600-1.c | 11 ++++++++--- gcc/testsuite/gcc.dg/pr94600-3.c | 11 ++++++++--- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr94600-1.c b/gcc/testsuite/gcc.dg/pr94600-1.c index c9a7bb9007e..149e4f35dbe 100644 --- a/gcc/testsuite/gcc.dg/pr94600-1.c +++ b/gcc/testsuite/gcc.dg/pr94600-1.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target size32plus } */ /* { dg-options "-fdump-rtl-final -O2" } */ +/* { dg-additional-options "-DALIGN_VAR" { target { ! non_strict_align } } } */ /* Assignments to a whole struct of suitable size (32 bytes) must not be picked apart into field accesses. */ @@ -12,7 +13,11 @@ typedef struct { unsigned int f3 : 7; } t0; -static t0 a0[] = { +static t0 a0[] +#ifdef ALIGN_VAR +__attribute__((aligned (4))) +#endif + = { { .f0 = 7, .f1 = 99, .f3 = 1, }, { .f0 = 7, .f1 = 251, .f3 = 1, }, { .f0 = 8, .f1 = 127, .f3 = 5, }, @@ -32,5 +37,5 @@ foo(void) } /* The only volatile accesses should be the obvious writes. */ -/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */ -/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */ +/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" } } */ +/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" } } */ diff --git a/gcc/testsuite/gcc.dg/pr94600-3.c b/gcc/testsuite/gcc.dg/pr94600-3.c index ff42c7db3c6..2fce9f13cfa 100644 --- a/gcc/testsuite/gcc.dg/pr94600-3.c +++ b/gcc/testsuite/gcc.dg/pr94600-3.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target size32plus } */ /* { dg-options "-fdump-rtl-final -O2 -fno-unroll-loops" } */ +/* { dg-additional-options "-DALIGN_VAR" { target { ! non_strict_align } } } */ /* Same-address version of pr94600-1.c. */ @@ -11,7 +12,11 @@ typedef struct { unsigned int f3 : 7; } t0; -static t0 a0[] = { +static t0 a0[] +#ifdef ALIGN_VAR +__attribute__((aligned (4))) +#endif + = { { .f0 = 7, .f1 = 99, .f3 = 1, }, { .f0 = 7, .f1 = 251, .f3 = 1, }, { .f0 = 8, .f1 = 127, .f3 = 5, }, @@ -31,5 +36,5 @@ foo(void) } /* The loop isn't unrolled. */ -/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */ -/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */ +/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" } } */ +/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" } } */