[Bug rtl-optimization/93974] [10 Regression] ICE in decompose_normal_address, at rtlanal.c:6403 on powerpc64le-linux-gnu since r10-6762
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93974 --- Comment #15 from rsandifo at gcc dot gnu.org --- (In reply to Peter Bergner from comment #14) > (In reply to Vladimir Makarov from comment #13) > > Sorry, I have no good knowledge of decompose_address. The original author > > is Richard Sandiford and it is even not in RA sources (it is in rtlanal.c). > > > > I'd recommend to ask Richard may be he has some insights or just modify to > > make it work in this case. I can only say it is a sensitive code which can > > affect a lot of targets if something is not done right. We had such problems > > with this code in the past. > > Ah, ok, CCing Richard. > > Richard, can you comment on the findings in Comment #4? Do we just need to > add support for seeing a (plus (and: ) (const int 8)) type of address or > should we have never created anything like that in the first place or ??? Sorry for the slow reply, been a bit of a hectic week. I think both fixes would be valid. Like you say, the address parsing code isn't yet ready to handle addresses that apply an offset *before* the address "mutations". That's because no target has yet wanted to support such an address. So as things stand, the current address is not valid and shouldn't have been created. In some ways this is similar to creating an invalid highpart subreg for the upper word of a doubleword vector hard register, or a subreg that falls foul of some simplify_subreg_regno rule. The RA has code to avoid doing that, see init_subregs_of_mode and its users. We could do something similar here for REG_EQUIV MEMs. One option would be to key off whether strip_address_mutations is a no-op on the address. Another would be to check whether each required sub-MEM is valid for its mode and offset. The latter would be more elaborate but might produce better code in general, not just for cases like this. Like you say in comment 4, even the zero-offset half isn't actually a valid address for PowerPC, so either of the two options should give better code as well as fixing the bug. On the other hand, the idea was always that address_info and its users could be extended if new targets have new requirements. So if we want to make this operation valid then that would be OK too.
[Bug target/94072] [10 Regression] ICE: SIGSEGV due to infinite recursion in expand_expr/expand_expr_real_1 with -msve-vector-bits=512 since r10-4197
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94072 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #2 from rsandifo at gcc dot gnu.org --- Mine.
[Bug target/94201] aarch64:ICE in tiny code model for ilp32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94201 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Resolution|--- |FIXED Component|c |target Status|NEW |RESOLVED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed on master.
[Bug target/94072] [10 Regression] ICE: SIGSEGV due to infinite recursion in expand_expr/expand_expr_real_1 with -msve-vector-bits=512 since r10-4197
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94072 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from rsandifo at gcc dot gnu.org --- Fixed on master.
[Bug tree-optimization/94261] New: [10 Regression] ICE in vect_get_vec_def_for_operand_1 for 3-element condition reduction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94261 Bug ID: 94261 Summary: [10 Regression] ICE in vect_get_vec_def_for_operand_1 for 3-element condition reduction Product: gcc Version: unknown Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* The following test ICEs when compiled with -O3 -ffinite-math-only -march=armv8.2-a+sve on aarch64: void f (float *x, float *y, float z) { float res0 = 0, res1 = 1, res2 = 2; for (int i = 0; i < 100; ++i) { res0 = 100 <= y[i * 3] ? res0 : z; res1 = 101 <= y[i * 3 + 1] ? res1 : z; res2 = y[i * 3 + 2] < 102 ? z : res2; } x[0] = res0; x[1] = res1; x[2] = res2; } The ICE is: b.c:2:1: internal compiler error: in vect_get_vec_def_for_operand_1, at tree-vect-stmts.c:1555 2 | f (float *x, float *y, float z) | ^ 0x10a86c3 vect_get_vec_def_for_operand_1(_stmt_vec_info*, vect_def_type) gcc/tree-vect-stmts.c:1555 0x10af8e1 vect_get_vec_def_for_operand(tree_node*, _stmt_vec_info*, tree_node*) gcc/tree-vect-stmts.c:1617 0x10b258f vectorizable_condition gcc/tree-vect-stmts.c:10213 0x10d01ea vect_transform_stmt(_stmt_vec_info*, gimple_stmt_iterator*, _slp_tree*, _slp_instance*) gcc/tree-vect-stmts.c:11012 0x10d4226 vect_transform_loop_stmt gcc/tree-vect-loop.c:8307 0x10ec31a vect_transform_loop(_loop_vec_info*, gimple*) gcc/tree-vect-loop.c:8707 0x110f797 try_vectorize_loop_1 gcc/tree-vectorizer.c:990 0x1110319 vectorize_loops() gcc/tree-vectorizer.c:1126 The problem comes from trying both SVE and Advanced SIMD and eventually picking SVE. The SVE version can't treat the loop as a single 3-element SLP reduction because we don't yet support loading { 100, 101, 102 } repeating for variable-length vectors. We therefore fail the SLP build before doing anything about the awkward comparison arrangement. With this version, each ?: is a separate reduction and each one has its own, correct, STMT_VINFO_REDUC_IDX. But then we try the Advanced SIMD version. This doesn't have a problem with loading the constants, so we get as far as dealing with mismatched comparisons: /* Swap. */ if (*swap == 1) { swap_ssa_operands (stmt, &TREE_OPERAND (cond, 0), &TREE_OPERAND (cond, 1)); TREE_SET_CODE (cond, swap_tree_comparison (code)); } /* Invert. */ else { swap_ssa_operands (stmt, gimple_assign_rhs2_ptr (stmt), gimple_assign_rhs3_ptr (stmt)); if (STMT_VINFO_REDUC_IDX (stmt_info) == 1) STMT_VINFO_REDUC_IDX (stmt_info) = 2; else if (STMT_VINFO_REDUC_IDX (stmt_info) == 2) STMT_VINFO_REDUC_IDX (stmt_info) = 1; bool honor_nans = HONOR_NANS (TREE_OPERAND (cond, 0)); code = invert_tree_comparison (TREE_CODE (cond), honor_nans); gcc_assert (code != ERROR_MARK); TREE_SET_CODE (cond, code); } But these changes to the gimple stmt persist even if the SLP build fails later, or if the SLP build succeeds and we decide not to vectorise that way. That doesn't matter too much for the current loop_vinfo, because the STMT_VINFO_REDUC_IDXs are still consistent with the gimple stmt. But it means that the STMT_VINFO_REDUC_IDXs for the older SVE loop_vinfo are now no longer correct. The ICE triggers when we go back to the SVE loop_vinfo as the cheapest implementation and try to code-generate the reduction. The testcase is reduced from 481.wrf compiled with LTO.
[Bug target/94254] [10 regression] r10-7312 causes compiler hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94254 rsandifo at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED CC||rsandifo at gcc dot gnu.org Last reconfirmed||2020-03-22 --- Comment #3 from rsandifo at gcc dot gnu.org --- Mine.
[Bug target/94254] [10 regression] r10-7312 causes compiler hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94254 --- Comment #4 from rsandifo at gcc dot gnu.org --- The cycling comes from reloading: (insn 7 6 8 2 (set (reg:SD 122 [ a32 ]) (mem/c:SD (reg/f:DI 120) [1 a32+0 S4 A32])) "gcc/testsuite/gcc.target/powerpc/pr39902-2.c":15:13 516 {movsd_hardfloat} (expr_list:REG_DEAD (reg/f:DI 120) (nil))) r122 is assigned an FPR, and the power6 pattern doesn't provide any alternatives that load from memory into FPRs, so we convert this into a secondary memory reload. Doing that for SDmode memory would cycle, but the rs6000 port has: /* Implement TARGET_SECONDARY_RELOAD_NEEDED_MODE. For SDmode values we need to use DDmode, in all other cases we can use the same mode. */ static machine_mode rs6000_secondary_memory_needed_mode (machine_mode mode) { if (lra_in_progress && mode == SDmode) return DDmode; return mode; } which says that the move should happen in DDmode instead. This means that the eventual FPR reload will happen in DDmode rather than SDmode. The problem is that rs6000_can_change_mode_class doesn't allow FPRs to change from SDmode to DDmode: if (from_size < 8 || to_size < 8) return false; So there seems to be a contradiction here: secondary memory reloads for FPRs have to happen in DDmode rather than SDmode, but FPRs aren't allowed to change to DDmode from SDmode. Previously this worked because LRA ignored rs6000_can_change_mode_class and changed the mode of the FPR regardless. I guess that must have been the right thing to do in context, but it would be good to pin down exactly why the SD->DD mode change is OK for rs6000 in the specific context of secondary memory reloads but not otherwise.
[Bug target/94254] [10 regression] r10-7312 causes compiler hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94254 --- Comment #5 from rsandifo at gcc dot gnu.org --- (In reply to Zdenek Sojka from comment #1) > I observe the same issue, and it breaks libgcc build for me: What configure arguments do you use?
[Bug target/94254] [10 regression] r10-7312 causes compiler hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94254 --- Comment #7 from rsandifo at gcc dot gnu.org --- Created attachment 48080 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48080&action=edit Proof-of-concept hack to back up the point in comment 4 This hack shows what I mean in comment 4. It "fixes" the three testcases but almost certainly isn't correct. The point is that before r10-7312 we did the mode change regardless of what rs6000_can_change_mode_class said. I don't know enough about powerpc to know what cases: if (from_size < 8 || to_size < 8) return false; is handling.
[Bug target/94254] [10 regression] r10-7312 causes compiler hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94254 --- Comment #9 from rsandifo at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #8) > SFmode values are stored as DP IEEE float normally. There may be other > cases as well, but this is the obvious one. OK, thanks. Does the same problem affect decimal floats (i.e. SD an DD)? If not, maybe the answer really is to allow these cases through, but probably without the lra_in_progress check in the hack/patch.
[Bug tree-optimization/94261] [10 Regression] ICE in vect_get_vec_def_for_operand_1 for 3-element condition reduction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94261 --- Comment #5 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #3) > (In reply to Richard Biener from comment #2) > > when placing gcc_unreachable () at the swapping place and most testcases > > still pass when removing the IL operand swapping, only vect-cselim-1.c > > runfails (investigating). > > Ah, SLP ultimately fails here so the scalar IL is broken by the operation > code adjustment. In the end we should be able to avoid doing tree code > adjustments as well since we're swapping to make the code the same > as the first stmts code and we only ever look at the first scalar stmt > during SLP code-gen/analysis. Yeah, started wondering about that later too... > So, it appears to work to comment _all_ of the code in > vect_get_and_check_slp_defs (including the reduc-idx adjustment). > > But again, test coverage is probably bad. Would be great if it works. Will give it a spin later on SVE, although I guess in some ways that probably exercises SLP even less (at least in length-agnostic mode).
[Bug tree-optimization/94261] [10 Regression] ICE in vect_get_vec_def_for_operand_1 for 3-element condition reduction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94261 --- Comment #7 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #6) > (In reply to rsand...@gcc.gnu.org from comment #5) > > (In reply to Richard Biener from comment #3) > > > (In reply to Richard Biener from comment #2) > > > > when placing gcc_unreachable () at the swapping place and most testcases > > > > still pass when removing the IL operand swapping, only vect-cselim-1.c > > > > runfails (investigating). > > > > > > Ah, SLP ultimately fails here so the scalar IL is broken by the operation > > > code adjustment. In the end we should be able to avoid doing tree code > > > adjustments as well since we're swapping to make the code the same > > > as the first stmts code and we only ever look at the first scalar stmt > > > during SLP code-gen/analysis. > > > > Yeah, started wondering about that later too... > > So there's probably a single path where this is not true which is when we > go through vect_attempt_slp_rearrange_stmts thus when we have a SLPed > set of cond-reductions. I'm trying to come up with a testcase ... > > But I think the cure here should be to punt on vect_slp_rearrange_stmts, > do you agree? You know SLP much better than me :-) But yeah, that sounds good. Would we need to punt even for swap==1, or just swap==2?
[Bug target/94254] [10 regression] r10-7312 causes compiler hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94254 rsandifo at gcc dot gnu.org changed: What|Removed |Added Attachment #48080|0 |1 is obsolete|| --- Comment #11 from rsandifo at gcc dot gnu.org --- Created attachment 48088 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48088&action=edit Candidate patch OK, thanks. This version drops the lra_in_progress hack and makes the test symmetric. As mentioned in the commit message, I'd wondered about generalising it further and allowing any 64-bit mode, but that seems unnecessarily dangerous for stage 4. The previous hack passed bootstrap & regtest on gcc112. I'll try with this version too.
[Bug target/94254] [10 regression] r10-7312 causes compiler hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94254 --- Comment #13 from rsandifo at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #12) > That patch looks fine, thank you! > > Is there some way you can test if this works? Ideally in the testsuite > of course, but that can be hard :-/ I've now tried doing a bootstrap & regtest with --with-cpu-power6 as well, comparing without the lra patch with after both patches. (With master the build hangs in the same way as it did for Zdenek.) There didn't seem to be any differences in test results. Hopefully that will have given some execution test coverage when running things like gcc.dg/dfp and c-c++-common/dfp.
[Bug tree-optimization/94398] ICE: in vectorizable_load, at tree-vect-stmts.c:9173
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94398 --- Comment #3 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #2) > But the ICE happens because the result from the function at transform time > does not match that at analysis time. > > Richard? Looks like we're trying to compute: alignment_support_scheme = vect_supportable_dr_alignment (first_dr_info, false); gcc_assert (alignment_support_scheme); even for VMAT_GATHER_SCATTER, which always accesses individual elements. Guess we should set alignment_support_scheme to dr_unaligned_supported instead of calling vect_supportable_dr_alignment. The target hook is probably incorrect for SVE + -mstrict-align.
[Bug tree-optimization/94398] ICE: in vectorizable_load, at tree-vect-stmts.c:9173
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94398 rsandifo at gcc dot gnu.org changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-03-30 Status|UNCONFIRMED |NEW --- Comment #4 from rsandifo at gcc dot gnu.org --- (In reply to rsand...@gcc.gnu.org from comment #3) > (In reply to Richard Biener from comment #2) > > But the ICE happens because the result from the function at transform time > > does not match that at analysis time. > > > > Richard? > > Looks like we're trying to compute: > > alignment_support_scheme > = vect_supportable_dr_alignment (first_dr_info, false); > gcc_assert (alignment_support_scheme); > > even for VMAT_GATHER_SCATTER, which always accesses individual > elements. Guess we should set alignment_support_scheme to > dr_unaligned_supported instead of calling > vect_supportable_dr_alignment. ...in the gather/scatter case only, of course :-)
[Bug rtl-optimization/92989] [10 Regression] The mips-mti-linux-gnu fails to build after r276327
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92989 --- Comment #10 from rsandifo at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #9) > Richard, could you please have a look again? OK, I'll try to get to this early next week.
[Bug rtl-optimization/92989] [10 Regression] The mips-mti-linux-gnu fails to build after r276327
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92989 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|REOPENED|ASSIGNED
[Bug rtl-optimization/92989] [10 Regression] The mips-mti-linux-gnu fails to build after r276327
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92989 rsandifo at gcc dot gnu.org changed: What|Removed |Added URL||https://gcc.gnu.org/piperma ||il/gcc-patches/2020-April/5 ||43302.html --- Comment #11 from rsandifo at gcc dot gnu.org --- Patch posted here: https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543302.html
[Bug rtl-optimization/92989] [10 Regression] The mips-mti-linux-gnu fails to build after r276327
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92989 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #13 from rsandifo at gcc dot gnu.org --- I think it really has been fixed this time. I was able to build a mips-mti-linux-gnu toolchain to completion (with libsanitizer disabled). Sorry for the runaround.
[Bug rtl-optimization/94605] New: [8/9/10 Regression] ICE in early-remat.c:process_block with multi-output asms
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94605 Bug ID: 94605 Summary: [8/9/10 Regression] ICE in early-remat.c:process_block with multi-output asms Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Compiling the following testcase with -O2 -march=armv8.2-a+sve: typedef int v8si __attribute__((vector_size(32))); int g (v8si, v8si); void f (void) { v8si x = {}, y = {}; while (g (x, y)) asm ("" : "+w" (x), "+w" (y)); } triggers: internal compiler error: in process_block, at early-remat.c:2051 This is because process_block doesn't handle insns that set multiple candidate registers.
[Bug rtl-optimization/94605] [8/9/10 Regression] ICE in early-remat.c:process_block with multi-output asms
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94605 rsandifo at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Target Milestone|--- |8.4 Ever confirmed|0 |1 Last reconfirmed||2020-04-15 --- Comment #1 from rsandifo at gcc dot gnu.org --- Have a patch.
[Bug target/94606] [10 Regression] ICE creating fixed-length SVE predicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94606 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2020-04-15 --- Comment #1 from rsandifo at gcc dot gnu.org --- Mine.
[Bug target/94606] New: [10 Regression] ICE creating fixed-length SVE predicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94606 Bug ID: 94606 Summary: [10 Regression] ICE creating fixed-length SVE predicate Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Compiling the following testcase with -O3 -march=armv8.2-a+sve -msve-vector-bits=256: const short mask[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 }; int foo (short *restrict x, short *restrict y) { for (int i = 0; i < 16; ++i) if (mask[i]) x[i] += y[i]; } gives: error: unrecognizable insn: 10 | } | ^ (insn 10 9 11 2 (set (reg:VNx16BI 105) (and:VNx16BI (xor:VNx16BI (reg:VNx8BI 103) (reg:VNx16BI 104)) (reg:VNx16BI 104))) "/tmp/bar.c":9:12 -1 (nil)) during RTL pass: vregs because the first operand to the xor has the wrong mode.
[Bug target/94606] [10 Regression] ICE creating fixed-length SVE predicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94606 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed on master.
[Bug target/94668] New: [10 Regression] ICE generating float vec_inits since r10-808
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94668 Bug ID: 94668 Summary: [10 Regression] ICE generating float vec_inits since r10-808 Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Compiling the following with -O -march=armv8.2-a+sve -msve-vector-bits=512: typedef float v16sf __attribute__ ((vector_size(64))); v16sf foo (float a) { return (v16sf) { 0, 0, 0, a, 0, 0, 0, 0, 0, a, 0, 0, 0, 0, 0, 0 }; } gives: foo.c:5:10: internal compiler error: in decompose, at rtl.h:2296 5 | return (v16sf) { 0, 0, 0, a, 0, 0, 0, 0, 0, a, 0, 0, 0, 0, 0, 0 }; | ^ 0xd8db39 wi::int_traits >::decompose(long*, unsigned int, std::pair const&) src/gcc/gcc/rtl.h:2296 0xd8db39 wide_int_ref_storage::wide_int_ref_storage >(std::pair const&, unsigned int) src/gcc/gcc/wide-int.h:1034 0xd8db39 generic_wide_int >::generic_wide_int >(std::pair const&, unsigned int) src/gcc/gcc/wide-int.h:790 0xd8db39 wi::binary_traits, std::pair, wi::int_traits >::precision_type, wi::int_traits >::precision_type>::result_type wi::sub, std::pair >(std::pair const&, std::pair const&) src/gcc/gcc/wide-int.h:2510 0xd8db39 rtx_vector_builder::step(rtx_def*, rtx_def*) const src/gcc/gcc/rtx-vector-builder.h:122 0xd8db39 vector_builder::elt(unsigned int) const src/gcc/gcc/vector-builder.h:254 0x1216012 aarch64_sve_expand_vector_init_handle_trailing_constants src/gcc/gcc/config/aarch64/aarch64.c:18393 0x1218fdc aarch64_sve_expand_vector_init src/gcc/gcc/config/aarch64/aarch64.c:18517 0x1219248 aarch64_sve_expand_vector_init src/gcc/gcc/config/aarch64/aarch64.c:18558 0x121926d aarch64_sve_expand_vector_init src/gcc/gcc/config/aarch64/aarch64.c:18562 0x1219710 aarch64_sve_expand_vector_init(rtx_def*, rtx_def*) src/gcc/gcc/config/aarch64/aarch64.c:18601 0x163d4eb gen_vec_initvnx16qiqi(rtx_def*, rtx_def*) src/gcc/gcc/config/aarch64/aarch64-sve.md:2535 0x9c9cde insn_gen_fn::operator()(rtx_def*, rtx_def*) const src/gcc/gcc/recog.h:317 0x9c9cde store_constructor src/gcc/gcc/expr.c:6965 0x9cc8fc expand_constructor src/gcc/gcc/expr.c:8279 0x9b35b9 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) src/gcc/gcc/expr.c:10388 0x9b5038 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) src/gcc/gcc/expr.c:10058 0x9c404a store_expr(tree_node*, rtx_def*, int, bool, bool) src/gcc/gcc/expr.c:5752 0x9c5d67 expand_assignment(tree_node*, tree_node*, bool) src/gcc/gcc/expr.c:5514 0x852437 expand_gimple_stmt_1 src/gcc/gcc/cfgexpand.c:3749 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug target/94668] [10 Regression] ICE generating float vec_inits since r10-808
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94668 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Last reconfirmed||2020-04-20 --- Comment #1 from rsandifo at gcc dot gnu.org --- The problem is that the rtx_vector_builder arguments are the wrong way around. Testing a patch.
[Bug target/94668] [10 Regression] ICE generating float vec_inits since r10-808
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94668 rsandifo at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |10.0
[Bug target/94668] [10 Regression] ICE generating float vec_inits since r10-808
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94668 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed on trunk.
[Bug tree-optimization/94683] New: ICE in forwprop when folding to a VEC_PERM_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94683 Bug ID: 94683 Summary: ICE in forwprop when folding to a VEC_PERM_EXPR Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Compiling this with -march=armv8.2-a+sve -msve-vector-bits=256 -O3: #include typedef float v8sf __attribute__((vector_size(32))); svfloat32_t f (svbool_t pg, svfloat32_t x, svfloat32_t y) { v8sf a = svadd_x (pg, x, 1); v8sf b = { a[0], a[0], a[2], a[2], a[4], a[4], a[6], a[6] }; return svdiv_x (pg, b, y); } gives: foo.c: In function ‘f’: foo.c:16:1: error: non-trivial conversion in ‘ssa_name’ 16 | } | ^ v8sf __SVFloat32_t b_17 = _16; during GIMPLE pass: forwprop dump file: foo.031t.forwprop1
[Bug tree-optimization/94683] ICE in forwprop when folding to a VEC_PERM_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94683 rsandifo at gcc dot gnu.org changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-04-21 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #1 from rsandifo at gcc dot gnu.org --- Testing a patch.
[Bug tree-optimization/94683] ICE in forwprop when folding to a VEC_PERM_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94683 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed on master.
[Bug tree-optimization/94700] New: ICE in forwprop when folding to a VEC_PERM_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94700 Bug ID: 94700 Summary: ICE in forwprop when folding to a VEC_PERM_EXPR Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Compiling this with -march=armv8.2-a+sve -msve-vector-bits=256 -O3: #include typedef float v8sf __attribute__((vector_size(32))); svfloat32_t f (svbool_t pg, svfloat32_t x, svfloat32_t y) { v8sf a = svadd_x (pg, x, 1); v8sf b = { a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7] }; return svdiv_x (pg, b, y); } gives: foo.c: In function ‘svfloat32_t f(svbool_t, svfloat32_t, svfloat32_t)’: foo.c:11:1: error: non-trivial conversion in ‘ssa_name’ 11 | } | ^ v8sf __SVFloat32_t b_18 = _1; during GIMPLE pass: forwprop dump file: foo.031t.forwprop1 This is the maybe_ident equivalent of PR94683. I should have realised at the time that both arms would need fixing.
[Bug tree-optimization/94700] ICE in forwprop when folding to a VEC_PERM_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94700 rsandifo at gcc dot gnu.org changed: What|Removed |Added Keywords||ice-on-valid-code Ever confirmed|0 |1 Last reconfirmed||2020-04-21 Target||aarch64*-*-* Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #1 from rsandifo at gcc dot gnu.org --- Testing a patch.
[Bug tree-optimization/94700] ICE in forwprop when folding to a VEC_PERM_EXPR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94700 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed on master.
[Bug target/94678] aarch64: unexpected result with -mgeneral-regs-only and sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94678 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed on master. Thanks for the patch.
[Bug c/94712] aarch64:Adjust some testcases for ilp32 option conflict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94712 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED CC||rsandifo at gcc dot gnu.org --- Comment #2 from rsandifo at gcc dot gnu.org --- Fixed on master. Thanks for the patch.
[Bug tree-optimization/94727] [10 Regression] GCC produces incorrect code with -O3 since r10-5071-g02d895504cc59be0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94727 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #4 from rsandifo at gcc dot gnu.org --- OK, I'll take a look.
[Bug tree-optimization/94727] [10 Regression] GCC produces incorrect code with -O3 since r10-5071-g02d895504cc59be0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94727 --- Comment #5 from rsandifo at gcc dot gnu.org --- Well, this is a bit of mess (surprise). We have a "<" comparison between two booleans that are leaves of the SLP tree, so vectorizable_comparison falls back on: /* Invariant comparison. */ if (!vectype) { vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), slp_node); if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits)) return false; } rhs1 and rhs2 are *unsigned* boolean types, so we get back a vector of unsigned integers. All is well, and "<" works as expected without the need for: /* Boolean values may have another representation in vectors and therefore we prefer bit operations over comparison for them (which also works for scalar masks). We store opcodes to use in bitop1 and bitop2. Statement is vectorized as BITOP2 (rhs1 BITOP1 rhs2) or rhs1 BITOP2 (BITOP1 rhs2) depending on bitop1 and bitop2 arity. */ bool swap_p = false; if (VECTOR_BOOLEAN_TYPE_P (vectype)) { However, we then defer to vect_get_slp_defs to get the actual operands. The expected vector type is not part of this interface. The request goes to vect_get_constant_vectors, which does: if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op)) && vect_mask_constant_operand_p (stmt_vinfo)) vector_type = truth_type_for (stmt_vectype); else vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), op_node); So the function gives back a vector of mask types, which here are vectors of *signed* booleans. This means that "<" gives: true (-1) < false (0) and so the boolean fixup above was needed after all. I'm going to try: diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 7f3a9fb5fb3..88a1e2c51d2 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -10566,8 +10566,11 @@ vectorizable_comparison (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, /* Invariant comparison. */ if (!vectype) { - vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), -slp_node); + if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))) + vectype = mask_type; + else + vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), + slp_node); if (!vectype || maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits)) return false; } which does at least fix the testcase.
[Bug tree-optimization/94727] [10 Regression] GCC produces incorrect code with -O3 since r10-5071-g02d895504cc59be0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94727 --- Comment #7 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #6) > (In reply to rsand...@gcc.gnu.org from comment #5) > > > > However, we then defer to vect_get_slp_defs to get the actual operands. > > The expected vector type is not part of this interface. > > Ah yeah - sth on my list to fix (not making the type part of that API > but assigning vector types to SLP nodes). I even have partly completed > "hacks" to do that. When we have (and use!) vector types on all SLP > nodes we can also get rid of the mismatch code. Sounds great! The fewer decisions we make on the fly the better... > > I'm going to try: > > > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > > index 7f3a9fb5fb3..88a1e2c51d2 100644 > > --- a/gcc/tree-vect-stmts.c > > +++ b/gcc/tree-vect-stmts.c > > @@ -10566,8 +10566,11 @@ vectorizable_comparison (stmt_vec_info stmt_info, > > gimple_stmt_iterator *gsi, > >/* Invariant comparison. */ > >if (!vectype) > > { > > - vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), > > -slp_node); > > + if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))) > > + vectype = mask_type; > > + else > > + vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), > > + slp_node); > >if (!vectype || maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits)) > > return false; > > } > > > > which does at least fix the testcase. > > LGTM. Thanks. aarch64-linux-gnu and x86_64-linux-gnu bootstrapped passed, now trying an SVE test run. Will commit if that passes too.
[Bug tree-optimization/94727] [10 Regression] GCC produces incorrect code with -O3 since r10-5071-g02d895504cc59be0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94727 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from rsandifo at gcc dot gnu.org --- Fixed on master. Thanks for the report.
[Bug tree-optimization/94784] ICE: in simplify_vector_constructor, at tree-ssa-forwprop.c:2482
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94784 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from rsandifo at gcc dot gnu.org --- Fixed on trunk.
[Bug target/94852] -ffloat-store on x64 target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94852 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|NEW CC||rsandifo at gcc dot gnu.org Severity|normal |enhancement Ever confirmed|0 |1 Resolution|DUPLICATE |--- Last reconfirmed||2020-04-30 --- Comment #7 from rsandifo at gcc dot gnu.org --- Reopening. I don't think this is a dup of PR323. The problem is instead that the workaround for PR323 is silently accepted and applied even on targets for which the workaround isn't needed. That seems like a valid point to me, and if we don't have an earlier PR making the same point, I think we should keep this one open. I agree that at the very least the documentation could be improved (but I'm not volunteering :-))
[Bug tree-optimization/94899] Failure to optimize out add before compare with INT_MIN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94899 rsandifo at gcc dot gnu.org changed: What|Removed |Added Keywords||easyhack Status|RESOLVED|NEW Last reconfirmed||2020-05-01 CC||rsandifo at gcc dot gnu.org Resolution|INVALID |--- Ever confirmed|0 |1 --- Comment #4 from rsandifo at gcc dot gnu.org --- Reopening because... (In reply to Andrew Pinski from comment #2) > This is invalid as 0x8000 is unsigned (C90/C++03) or long (C99/C++11) in > type. > Which means then overflow is not undefined but rather wrapping. It's unsigned int for C99/C++11 too (see the different handling of decimal-literals and other integer-literals in [lex.icon.type]). This means that the result of the addition is also unsigned, For the specific value of 0x8000, the transformation is monotonic, so the optimisation is valid and well-defined.
[Bug rtl-optimization/94873] [8/9/10/11 Regression] wrong code with -O -fno-merge-constants -fno-split-wide-types -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94873 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #6 from rsandifo at gcc dot gnu.org --- (In reply to Jeffrey A. Law from comment #5) > I can't see how a REG_EQUAL note on an insn with multiple outputs can > possibly work -- we wouldn't know what output the REG_EQUAL note refers to. > And we have to consider an embedded side effect as having an output. > > Or to think of it another way, any embedded side effect can be implemented > with a parallel at which point it's painfully obvious the insn has multiple > outputs and a REG_EQUAL note would be inappropriate. Yeah, I can see that argument, but to play devil's advocate: I think the requirement for having a single REG SET_DEST makes sense because the REG_EQUAL note would be genuinely ambiguous if there were multiple REG SET_DESTs. But in the case of a REG_INC insn with a single REG SET_DEST, there's no ambiguity about which register is meant. I guess there's also the problem that stack pushes don't need a REG_INC note, so anything we do can't just be keyed off REG_INC. The only sure way to check whether a register is set as a side-effect is to look at the complete pattern (like dse.c:check_for_inc_dec). So I think there's the argument that optimisers have to be wary of this in the same way that they need to be wary of folding: (set (reg X) (and (mem (pre_inc (reg Y))) (reg Z))) into (set (reg X) (const_int 0)) when Z can be proven to be zero.
[Bug rtl-optimization/94873] [8/9/10/11 Regression] wrong code with -O -fno-merge-constants -fno-split-wide-types -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94873 --- Comment #8 from rsandifo at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #7) > REG_EQ* is documented as only being allowed on insns that set only one > register. If you want to change that, you'll have to check *all* code > that consumes this, see if they rely on that fact or not, and if so, > change that. But the point is that the word "set" is ambiguous. Does it mean set by a SET or set by any means? I think it can be read both ways. After all, a CLOBBER is a form of set too, but that's clearly meant to be excluded. Either way will need auditing. If we say that this kind of REG_EQUAL is wrong, we'd in theory need to audit everything that adds REG_EQUAL notes to make sure it has an appropriate check, or doesn't need one. I'm also not sure if we really are concerned about multiple registers in this particular case, or whether it's more the case that we don't want REG_EQUAL notes on insns with side effects.
[Bug middle-end/94941] New: Expansion of some internal fns can drop the lhs on the floor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94941 Bug ID: 94941 Summary: Expansion of some internal fns can drop the lhs on the floor Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- internal-fn.c:expand_mask_load_optab_fn uses expand_insn to emit the load instruction, but doesn't then test whether the coerced output operand is the same as the target of the gcall. It might not be, for example, in unoptimised code, where the target of the gcall expands to a MEM rtx and the load insn requires a REG destination. We need the equivalent of: if (!rtx_equal_p (lhs_rtx, ops[0].value)) emit_move_insn (lhs_rtx, ops[0].value); in expand_while_optab_fn. This can be seen for AArch64 with the following test, compiled with -O0 -march=armv8.2-a+sve: -- #include svfloat32_t foo (float *ptr) { svbool_t pg = svptrue_pat_b32 (SV_VL1); svfloat32_t res = svld1 (pg, ptr); return res; } int main (void) { svbool_t pg = svptrue_pat_b32 (SV_VL1); float x[1] = { 1 }; if (svptest_any (pg, svcmpne (pg, foo (x), 1.0))) __builtin_abort (); return 0; } -- We emit: ;; res_5 = .MASK_LOAD (ptr_4(D), 4B, _2); (insn 9 8 10 (set (reg/f:DI 96) (mem/f/c:DI (plus:DI (reg/f:DI 87 virtual-stack-vars) (const_poly_int:DI [-40, -32])) [3 ptr+0 S8 A64])) "/tmp/foo.c":7:21 -1 (nil)) (insn 10 9 0 (set (reg:VNx4SF 97) (unspec:VNx4SF [ (reg:VNx4BI 92 [ _2 ]) (mem:VNx4SF (reg/f:DI 96) [0 MEM [(float *)ptr_4(D)]+0 S[16, 16] A8]) ] UNSPEC_LD1_SVE)) "/tmp/foo.c":7:21 -1 (nil)) but don't store reg 97 to the stack slot for "res". Then the return statement loads from "res": (insn 12 11 0 (set (reg:VNx4SF 93 [ _6 ]) (unspec:VNx4SF [ (subreg:VNx4BI (reg:VNx16BI 98) 0) (mem/c:VNx4SF (plus:DI (reg/f:DI 87 virtual-stack-vars) (const_poly_int:DI [-32, -32])) [2 res+0 S[16, 16] A128]) ] UNSPEC_PRED_X)) "/tmp/foo.c":8:10 -1 (nil)) meaning we return uninitialised stack contents. The same problem affects expand_load_lanes_optab_fn and expand_gather_load_optab_fn. I think this problem has existed since the mask load/store functions were introduced, but it was probably latent until GCC 10 because nothing would use them in unoptimised code.
[Bug middle-end/94941] Expansion of some internal fns can drop the lhs on the floor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94941 rsandifo at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed||2020-05-04 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #1 from rsandifo at gcc dot gnu.org --- Testing a patch.
[Bug rtl-optimization/94873] [8/9/10/11 Regression] wrong code with -O -fno-merge-constants -fno-split-wide-types -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94873 --- Comment #11 from rsandifo at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #9) > "clobber" is a red herring; it is impossible to make a REG_EQ* note for > a clobber, a clobber does not set a new value (that is the whole point > of a clobber). It's not possible to attach a REG_EQ* note to an auto-inc-dec either though (that was my point). So a clobber doesn't seem any less of a red herring than the auto-inc-dec itself. Both set registers in the sense of changing them, and neither can be described by a REG_EQ* note. > I think we could allow auto-modify, sure, just as long as it stays clear > what lhs the REG_EQ* note is talking about; and, as you say, everything > needs to be audited for it :-/ Yeah. But to be clear: I don't think this is more obviously a change from the status quo than going in the other direction would be. The point is that the status quo is ambiguous: the documentation can be read either way, and the implementation isn't consistent (hence the bug). So it's a question of how we resolve the ambiguity. If we want passes to be able to assume without checking that insns with REG_EQ* notes don't also include auto inc-dec, we'll need to audit places that create the notes, or that update insns with existing notes. I think it comes down to what the REG_EQ* notes are supposed to achieve (conceptually, ignoring documentation and the current implementation for now). The "weak" guarantee is that the SET_DEST has the specified value after the instruction. The "strong" guarantee extends the weak guarantee by saying that the SET_SRC of the definition can be replaced by the REG_EQ* note without changing behaviour. Having auto-inc-dec in the SET_SRC of the definition is OK for the weak guarantee but not the strong guarantee. But the same would be true of any SET_SRC with side effects. So to frame the question in a different way: let's assume there's a target-specific intrinsic that has side effects that can be described using unspec_volatile, and that the intrinsic also sets a register. Normally this would be described as: (set (reg X) (unspec_volatile ...)) But if, in a particular context, the target could predict what the value of X was, could it attach a REG_EQ* note to say that? It would then be valid to simplify later uses of X, even though the definition of X can't change. IMO this is easier to answer for REG_EQUIV. That mostly exists to allow the RA to rematerialise a value instead of reloading it. So it's all about replacing the uses of the register rather than about replacing the definition. (I'm not saying that the RA would handle a REG_EQUIV note on the above unspec_volatile correctly -- haven't checked either way -- but in principle it could.) The weak guarantee makes life harder for consumers of the notes that want the strong guarantee, since they then have to check for side effects themselves. The weak guarantee is easy for producers of the notes and for consumers that only care about users of the register. The strong guarantee makes life harder for producers of the notes, or for optimisations that modify insns with existing notes. The strong guarantee is easy for consumers of the notes because it's more conservative. The weak guarantee potentially allows more optimisation than the strong guarantee. I don't think there's much in it. But I guess I personally prefer the weak guarantee for the "more optimisation" reason. There's also a very tenuous analogy with REG_RETURNED, which is explicitly for saying what the return value is, rather than saying how the definition can be rewritten. But whichever we go for, I think it should be a decision about side effects vs. no side effects, with auto inc-dec being just one of several potential side effects. I don't see any reason why the auto-inc-dec case would be different from the unspec_volatile case.
[Bug middle-end/94941] Expansion of some internal fns can drop the lhs on the floor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94941 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from rsandifo at gcc dot gnu.org --- Fixed in master and on the GCC 10 branch.
[Bug rtl-optimization/94873] [8/9/10/11 Regression] wrong code with -O -fno-merge-constants -fno-split-wide-types -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94873 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||ebotcazou at gcc dot gnu.org --- Comment #12 from rsandifo at gcc dot gnu.org --- Just noticed that Eric wasn't on cc: (although he might have preferred that it stay that way).
[Bug target/92469] ICE: output_operand: invalid use of register 'frame' in 7/8/9/10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92469 --- Comment #10 from rsandifo at gcc dot gnu.org --- "jakub at gcc dot gnu.org" writes: > --- Comment #9 from Jakub Jelinek --- > Seems neither accessible_reg_set nor operand_reg_set can exclude frame, > because > the latter results in general_operand predicate failing for it and the former > results into the latter not being enabled either. > So, if we wanted to reject this right away, we'd have to add another hard reg > set (e.g. containing gcc internal artifical regs that shouldn't appear in asm > register specification), or a target hook that would reject such registers > that > varasm.c would call. Agree a hard reg set would be good. We should be able to fill in the easy cases automatically, e.g. FRAME_POINTER_REGNUM when it's != HARD_FRAME_POINTER_REGNUM. But some cases would still need target help.
[Bug target/94980] [8/9/10/11 Regression] ICE: verify_gimple failed: position plus size exceeds size of referenced object in 'bit_field_ref' with -mavx512vl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94980 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #2 from rsandifo at gcc dot gnu.org --- Mine. Looks like the problem was latent before. The testcase ends up using vector boolean types with 2-bit element types. Before that patch we computed the TYPE_SIZE of the vector type as: TYPE_SIZE (type) = int_const_binop (MULT_EXPR, TYPE_SIZE (innertype), bitsize_int (nunits)); where the TYPE_SIZE of the boolean is 8 bits. So the TYPE_SIZE of the vector was 4*8 == 32 bits, even though the TYPE_SIZE_UNIT was only 1 byte. This meant that the BIT_FIELD_REFs passed the tree-cfg.c checks but led to incorrect rtl: ;; _13 = { -1, -1, -1, -1 }; (insn 18 17 0 (set (reg:QI 92 [ _13 ]) (const_int 15 [0xf])) "/tmp/pr94980.c":6:17 -1 (nil)) ;; _14 = BIT_FIELD_REF <_13, 8, 8>; (insn 19 18 20 (set (mem/c:QI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -4 [0xfffc])) [2 S1 A8]) (reg:QI 92 [ _13 ])) "/tmp/pr94980.c":6:17 -1 (nil)) (insn 20 19 0 (set (reg:QI 93 [ _14 ]) (mem/c:QI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -3 [0xfffd])) [0 S1 A8])) "/tmp/pr94980.c":6:17 -1 (nil)) where the last insn reads back uninitialised memory.
[Bug target/94980] [8/9/10/11 Regression] ICE: verify_gimple failed: position plus size exceeds size of referenced object in 'bit_field_ref' with -mavx512vl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94980 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from rsandifo at gcc dot gnu.org --- Fixed on master. As mentioned on-list, GCC 7 generated wrong code instead of ICEing, and there's a danger GCC 8+ could regress to that in other cases if we backported the patches. So instead it seems more appropriate to leave the branches alone for now and perhaps revisit if a real-world test case comes along. We might by that time have found more places that need similar fixes.
[Bug target/95105] New: Bogus reference-compatibility error for arm_sve_vector_bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95105 Bug ID: 95105 Summary: Bogus reference-compatibility error for arm_sve_vector_bits Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Compiling this testcase with -march=armv8.2-a+sve -msve-vector-bits=512: -- typedef __SVFloat32_t foo; typedef foo bar __attribute__((arm_sve_vector_bits(512))); template struct s { T x; }; extern s a; bar &b = a.x; -- gives the bogus error: error: cannot bind non-const lvalue reference of type ‘bar&’ {aka ‘__SVFloat32_t&’} to an rvalue of type ‘bar’ {aka ‘__SVFloat32_t’} 5 | bar &b = a.x; | ~~^ The testcase works if the attribute is applied directly to __SVFloat32_t. The bug is in the target-specific handling of the attribute.
[Bug target/95105] Bogus reference-compatibility error for arm_sve_vector_bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95105 rsandifo at gcc dot gnu.org changed: What|Removed |Added Ever confirmed|0 |1 Target Milestone|--- |10.2 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Last reconfirmed||2020-05-13
[Bug middle-end/95114] New: [9/10/11 Regression] ICE in obj_type_ref_class for structural-equality types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95114 Bug ID: 95114 Summary: [9/10/11 Regression] ICE in obj_type_ref_class for structural-equality types Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org CC: hubicka at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Since r9-5035-g4611c03d2b0edefc8d8e17872ef143428f56380b, the following testcase has ICEd for aarch64, compiled at -O0: --- template struct foo { virtual void f() = 0; }; extern foo<__Int8x8_t> &x; void f() { x.f(); } --- The exact ICE has changed over time, but the current form is: --- during GIMPLE pass: *rebuild_cgraph_edges foo.c: In function ‘void f()’: foo.c:3:19: internal compiler error: Segmentation fault 3 | void f() { x.f(); } | ^ 0x16cfec1 crash_signal src/gcc/gcc/toplev.c:328 0x129a4ca obj_type_ref_class(tree_node const*) src/gcc/gcc/ipa-devirt.c:1915 0x1afe3d7 virtual_method_call_p(tree_node const*) src/gcc/gcc/tree.c:12867 0xf1102b cgraph_node::create_indirect_edge(gcall*, int, profile_count, bool) src/gcc/gcc/cgraph.c:990 0xf2113d cgraph_edge::rebuild_edges() src/gcc/gcc/cgraphbuild.c:421 0xf214aa execute src/gcc/gcc/cgraphbuild.c:490 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. --- This is because for structural-equality types we call get_odr_type: --- if (!in_lto_p && !TYPE_STRUCTURAL_EQUALITY_P (ret)) ret = TYPE_CANONICAL (ret); else ret = get_odr_type (ret)->type; --- and the hash table is empty at this point. (We did previously have a hash entry for the type, but that was freed during pass_ipa_free_lang_data.) The test does pass at -O2, but fails with -fdump-tree-all-details -O2: --- foo.c: In function ‘void f()’: foo.c:3:19: internal compiler error: Segmentation fault 3 | void f() { x.f(); } | ^ 0x16cfec1 crash_signal src/gcc/gcc/toplev.c:328 0x12a20f8 hash_table::find_slot_with_hash(tree_node* const&, unsigned int, insert_option) src/gcc/gcc/hash-table.h:967 0x129a622 get_odr_type(tree_node*, bool) src/gcc/gcc/ipa-devirt.c:1939 0x129a4c9 obj_type_ref_class(tree_node const*) src/gcc/gcc/ipa-devirt.c:1915 0x1afe3d7 virtual_method_call_p(tree_node const*) src/gcc/gcc/tree.c:12867 0x180f086 dump_generic_node(pretty_printer*, tree_node*, int, dump_flag, bool) src/gcc/gcc/tree-pretty-print.c:3125 ... --- I guess the fact that this routine is called by the dump routines (which shouldn't affect codegen) suggests that we should cope with null returns from get_odr_type, rather than passing "true" as the insert parameter. Is that right?
[Bug middle-end/95114] [9/10/11 Regression] ICE in obj_type_ref_class for structural-equality types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95114 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Target Milestone|--- |9.4 Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Last reconfirmed||2020-05-13 Ever confirmed|0 |1 --- Comment #1 from rsandifo at gcc dot gnu.org --- Testing a patch that tests for null returns.
[Bug target/92658] x86 lacks vector extend / truncate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 --- Comment #12 from rsandifo at gcc dot gnu.org --- (In reply to rguent...@suse.de from comment #11) > On Thu, 14 May 2020, ubizjak at gmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 > > > > --- Comment #10 from Uroš Bizjak --- > > The patch is ready to be pushed, it is waiting for a decision what to do > > with > > failed cases. > > > > Richi, should this patch move forward (eventually XFAILing failed cases), > > or do > > you plan to look at the fails from the generic vectorizer POV? > > I think we should go forward with the patch, either XFAILing the testcases > or splitting out the testcase (and backend patterns that do not get > exercised due to the issue). > > I've already said in comment#8 that the issue here is optabs working > with modes and not vector types, so it's a bit hard to use the > same mechanism to deal with the currently failing cases. One > possible route would be to add V4QImode similar to how we now > do V2SFmode with SSE but of course where do we stop ... > > As said we can try to tackle this incrementally. Maybe Richard also > has input here? Nothing to add really, but: yeah, the idea was that the target would provide smaller vector modes if it can efficiently load, store and (at least) add them. I think it would be good to do that for aarch64 Advanced SIMD at some point. There are three points behind using vector modes for this: 1. It gives the target the flexibility to decide how best to implement the operation (All at one end, and if so, which end? Or spread out evenly across the vector?) On aarch64, the choice would differ between SVE and Advanced SIMD. 2. It means that __attribute__((vector_size)) vectors benefit. 3. The smaller modes can be used in conjunction with autovectorize_vector_modes to give the loop vectoriser a choice between operating on packed or unpacked vectors. This is mostly useful for VECT_COMPARE_COSTS but can help with low-trip-count loops even without. So yeah, defining V4QI sounds like the way to go if the architecture can process it efficiently. I get the "where do we stop" concern, but .md mode iterators should help.
[Bug target/94991] aarch64: ICE: Segmentation fault with option -mgeneral-regs-only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94991 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED CC||rsandifo at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed on trunk.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #8 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #7) > (In reply to Uroš Bizjak from comment #6) > > (In reply to Hongtao.liu from comment #5) > > > (In reply to Uroš Bizjak from comment #3) > > > > It turns out that a bunch of patterns have to be renamed (and testcases > > > > added). > > > > > > > > Easyhack, waiting for someone to show some love to conversion patterns > > > > in > > > > sse.md. > > > > > > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists. > > > > > > if change **float_double fix_double** to > > > --- > > > void > > > float_double (void) > > > { > > > d[0] = i[0]; > > > d[1] = i[1]; > > > d[2] = i[2]; > > > d[3] = i[3]; > > > } > > > > Hm, the above is vectorized, but the equivalent: > > > > void > > float_double (void) > > { > > for (int n = 0; n < 4; n++) > > d[n] = i[n]; > > } > > > > is not? > > Yes, we're committing to a too high VF here, likely because we pick the > "wrong" vector mode too early. We could eventually fix this up in > the early vectype analysis. It might be worth investigating VECT_COMPARE_COSTS, which weighs the cost of different VFs against each other and is how SVE copes with this. I guess the danger is that it might interfere with -mprefer-* options (although the first VF listed by autovectorize_vector_modes wins in a tie).
[Bug target/95341] New: Poor vector_size decomposition when SVE is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95341 Bug ID: 95341 Summary: Poor vector_size decomposition when SVE is enabled Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Compiling the following with -O2 -march=armv8-a: typedef unsigned int uint_vec __attribute__((vector_size(32))); uint_vec f1(uint_vec x, uint_vec y) { return x + y; } generates nice Advanced SIMD code: ld1 {v4.16b - v5.16b}, [x0] ld1 {v2.16b - v3.16b}, [x1] add v0.4s, v4.4s, v2.4s add v1.4s, v5.4s, v3.4s st1 {v0.16b - v1.16b}, [x8] ret But compiling with -march=armv8.2-a+sve generates extremely bad scalar code, so bad that I'll spare people's eyes by not quoting it here. I haven't yet analysed this or checked whether it's a regression.
[Bug target/95341] Poor vector_size decomposition when SVE is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95341 --- Comment #3 from rsandifo at gcc dot gnu.org --- (In reply to ktkachov from comment #2) > Interestingly, -msve-vector-bits gives good codegen for 128, 256, 512 but > bad for 1024, 2048 and VLA code That's because addition is the one operation that we currently support on unpacked vectors. For subtraction it's a different story (just 128-bit and 256-bit is good, as expected). But the surprising thing for me was that we didn't fall back to Advanced SIMD in VLA mode...
[Bug target/95361] New: Segfault when generating an epilogue for a partly-shrinked-wrapped SVE frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95361 Bug ID: 95361 Summary: Segfault when generating an epilogue for a partly-shrinked-wrapped SVE frame Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Compiling the following with -O2 -march=armv8.2+sve causes the compiler to segfault during aarch64_expand_epilogue: __SVInt8_t f (__SVInt8_t x, int y) { if (y == 1) asm volatile ("" ::: "z8"); if (y == 2) asm volatile ("" ::: "z9"); return x; } The problem is that we individually shrink-wrap the saves and restores of z8 and z9, but need to keep the stack allocation common to both arms. Before emitting the deallocation instruction, we try to add a REG_CFA_DEF_CFA note to the final restore, which doesn't exist: if (callee_adjust != 0 || maybe_gt (initial_adjust, 65536)) { /* Emit delayed restores and set the CFA to be SP + initial_adjust. */ insn = get_last_insn (); rtx new_cfa = plus_constant (Pmode, stack_pointer_rtx, initial_adjust); REG_NOTES (insn) = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops); RTX_FRAME_RELATED_P (insn) = 1; cfi_ops = NULL; } This in practice only happens for SVE because: (a) We don't try to shrink-wrap wb_candidate* registers even when we've decided to treat them as normal saves and restores. I have a fix for that. (b) Even with (a) fixed, we're (almost?) guaranteed to emit a stack tie for frames that are 64k or larger, so we end up hanging the REG_CFA_DEF_CFA note on that instead. I haven't yet checked how far back this goes.
[Bug target/95361] Segfault when generating an epilogue for a partly-shrinked-wrapped SVE frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95361 rsandifo at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed||2020-05-27 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #1 from rsandifo at gcc dot gnu.org --- Mine.
[Bug target/95459] aarch64: ICE in in aarch64_short_vector_p, at config/aarch64/aarch64.c:16803
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95459 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #1 from rsandifo at gcc dot gnu.org --- (In reply to Fei Yang from comment #0) > - gcc_assert (aarch64_sve_mode_p (mode)); > + gcc_assert (TARGET_SVE ? aarch64_sve_mode_p (mode) : true); Probably clearer as: !TARGET_SVE || aarch64_sve_mode_p (mode) Might be worth adding a comment like: /* Leave later code to report an error if SVE is disabled. */
[Bug target/95459] aarch64: ICE in aarch64_short_vector_p, at config/aarch64/aarch64.c:16803
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95459 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug target/95523] aarch64:ICE in register_tuple_type,at config/aarch64/aarch64-sve-builtins.cc:3434
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95523 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Status|UNCONFIRMED |WAITING Last reconfirmed||2020-06-04 Ever confirmed|0 |1 --- Comment #1 from rsandifo at gcc dot gnu.org --- The reason for the assert is that the alignment is part of the ABI of the types and is relied on when using LDR and STR for some moves. Even though -fpack-struct=N changes the ABI in general, I don't think it should change it in this particular case. I have to wonder why GCC even has -fpack-struct= though. Do you have a specific need for it, or was this caught by option coverage testing? If there is no specific need, I'd be tempted to make -fpack-struct unsupported for AArch64. I think there are so many other things that could go wrong if it is used.
[Bug target/95523] aarch64:ICE in register_tuple_type,at config/aarch64/aarch64-sve-builtins.cc:3434
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95523 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |NEW --- Comment #3 from rsandifo at gcc dot gnu.org --- (In reply to z.zhanghaij...@huawei.com from comment #2) > (In reply to rsand...@gcc.gnu.org from comment #1) > > The reason for the assert is that the alignment is part of the > > ABI of the types and is relied on when using LDR and STR for > > some moves. Even though -fpack-struct=N changes the ABI in general, > > I don't think it should change it in this particular case. > > > > I have to wonder why GCC even has -fpack-struct= though. Do you have > > a specific need for it, or was this caught by option coverage testing? > > > > If there is no specific need, I'd be tempted to make -fpack-struct > > unsupported for AArch64. I think there are so many other things > > that could go wrong if it is used. > > We use the option in some projects. If this option is not supported, the > impact is relatively large. Ah, OK. If it's an issue that affects real use-cases then let's make it work. > I think it can make the error reporting more > friendly, so that this option can still be used normally without sve. > > like: > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc > b/gcc/config/aarch64/aarch64-sve-builtins.cc > index bdb04e8170d..e93e766aba6 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -3504,6 +3504,12 @@ handle_arm_sve_h () >return; > } > > + if (maximum_field_alignment) > +{ > + error ("SVE is incompatible with the use of %qs or %qs", > "-fpack-struct", "#pragma pack"); > + return; > +} > + >sve_switcher sve; > >/* Define the vector and tuple types. */ > > Any suggestions? Could you try setting DECL_USER_ALIGN on the FIELD_DECL? that should (hopefully) force the field to keep its natural alignment.
[Bug middle-end/95528] [10/11 Regression] internal compiler error: in emit_move_insn, at expr.c:3814
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95528 --- Comment #6 from rsandifo at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #4) > I'd say the vectorizer/simplify_vector_constructor just shouldn't attempt to > use these (e.g. vec_pack*, vec_unpack* optabs) for !VEC_MODE_P unless it is > VECTOR_BOOLEAN_TYPE_P type. > For i386 it would be the right thing as the patterns really assume that it > is vector booleans and have their properties. > Though, aarch64 seems to have vec_pack_trunc_di and vec_pack_trunc_df > expanders, it is unclear to me what they are for and if they are really used. > Other targets seem to only define these for vector modes. Yeah, I agree those look odd. The covering note for the patch that added them was: https://gcc.gnu.org/pipermail/gcc-patches/2013-April/361636.html which talks about fixing gcc.dg/vect failures. But as James says, only the 128-bit patterns should be needed for that. Maybe the the 64-bit patterns were just added for completeness. Perhaps one justification for _di is that there is no V1DI mode. Instead a vector of 1 DImode would itself have mode DImode. So in principle, vec_pack_trunc_di is probably the right name for a (V1)DI->V2SI truncate. The same doesn't apply to _df since we don't use scalar float modes for V1 vectors. And (unlike at the time of the patch) we now have V1DF. So I agree that the _df one looks dead. That said, for AArch64 we'd now try to mix 128-bit and 64-bit vectors instead of vectorising with 2 64-bit vectors. So the _di pattern probably isn't useful in practice either. In summary: from an AArch64 perspective, it's probably fine to check !VECTOR_MODE_P || VECTOR_BOOLEAN_TYPE_P. But given the V1 thing, maybe it would be better to add || m == GET_MODE_INNER (m) as well (unless that defeats the fix).
[Bug middle-end/95528] [10/11 Regression] internal compiler error: in emit_move_insn, at expr.c:3814
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95528 --- Comment #7 from rsandifo at gcc dot gnu.org --- (In reply to rsand...@gcc.gnu.org from comment #6) > In summary: from an AArch64 perspective, it's probably fine to > check !VECTOR_MODE_P || VECTOR_BOOLEAN_TYPE_P. But given the V1 > thing, maybe it would be better to add || m == GET_MODE_INNER (m) > as well (unless that defeats the fix). Er, I mean: m == TYPE_MODE (TREE_TYPE (vectype)) or whatever. m == GET_MODE_INNER (m) is of course always true for scalars :-)
[Bug target/95254] aarch64: gcc generate inefficient code with fixed sve vector length
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED CC||rsandifo at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug middle-end/95528] [10/11 Regression] internal compiler error: in emit_move_insn, at expr.c:3814
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95528 --- Comment #9 from rsandifo at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #8) > Created attachment 48683 [details] > gcc11-pr95528.patch > > Untested fix. The VECTOR_TYPE_P condition should be redundant. Looks good to me otherwise FWIW.
[Bug target/95523] aarch64:ICE in register_tuple_type,at config/aarch64/aarch64-sve-builtins.cc:3434
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95523 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/95570] vect: ICE: Segmentation fault in vect_loop_versioning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95570 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED CC||rsandifo at gcc dot gnu.org --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/95694] [9/10/11 Regression] ICE in trunc_int_for_mode, at explow.c:59 since r9-7156-g33579b59aaf02eb7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95694 rsandifo at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-06-16 --- Comment #2 from rsandifo at gcc dot gnu.org --- OK. It looks like the mode is readily available in each call to REDUCE_BIT_FIELD, so I think we should just pass that down. It would also avoid: if (target && GET_MODE (target) != GET_MODE (exp)) target = 0; creating unnecessary temporaries for constant integers.
[Bug tree-optimization/95199] Remove extra variable created for memory reference in loop vectorization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95199 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED CC||rsandifo at gcc dot gnu.org --- Comment #11 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug c++/95726] ICE with aarch64 __Float32x4_t as template argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95726 --- Comment #2 from rsandifo at gcc dot gnu.org --- (In reply to Jason Merrill from comment #1) > Does the aarch64 port expect __Float32x4_t type to be considered equivalent > to the GNU vector type or not? If so, why use build_distinct_type_copy over > build_variant_type_copy? If not, they might want to set TYPE_INDIVISIBLE_P > so that structural_comptypes treats them as different. They mangle differently, and e.g.: void f(float32x4_t); void f(V); aren't ODR equivalent. But a lot of code relies on the GNU vector extensions being available for float32x4_t as well as V, and on V and float32x4_t being mutually and implicitly interconvertible.
[Bug c++/95726] ICE with aarch64 __Float32x4_t as template argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95726 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org
[Bug c++/95726] ICE with aarch64 __Float32x4_t as template argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95726 --- Comment #4 from rsandifo at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #3) > But if they mangle differently, then structural_comptypes shouldn't treat > them as same types. That certainly avoids the ICE, and makes GCC's behaviour consistent with Clang for things like: typedef float vecf __attribute__((vector_size(16))); vecf x; float32x4_t &y = x; Previously we accepted this, with the struct_comptypes change we reject it (like Clang does). But that might break existing code, so I'm not sure it would be backportable. I guess the question then is: what does TYPE_STRUCTURAL_EQUALITY_P mean for VECTOR_TYPEs in the context of structural_comptypes? And (if this is a different question) what case is that function's VECTOR_TYPE handling for? I.e. when do we want to return true for a pair of VECTOR_TYPEs whose TYPE_MAIN_VARIANTs are different?
[Bug c++/95726] ICE with aarch64 __Float32x4_t as template argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95726 --- Comment #8 from rsandifo at gcc dot gnu.org --- Thanks for the pointers. Putting the mangled name in a target-specific attribute (like we do for SVE) seems to fix it. It actually also keeps the testcase in comment 4 “working”, which is unexpected (to me) and seems a little dubious. I guess it's good news for backporting though.
[Bug bootstrap/95805] [11 regression] gcc/recog.h:301:30: error: too many arguments to function call, expected 1, have 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95805 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from rsandifo at gcc dot gnu.org --- Fixed by g:20a2e6afa82c165bcd1a8749478e9402a25088a4.
[Bug target/95674] Unnecessary move when doing division-by-multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95674 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #4 from rsandifo at gcc dot gnu.org --- (In reply to Vladimir Makarov from comment #3) > I looked at this problem. > > All assignments are done in IRA (LRA does not change them). We can not make > a better assignment because scratches do not permit to store any preferences > from instruction constraints (pseudo-registers permits to do this). So > storing and getting this info for scratches is the first step to solving the > problem. > > LRA changes scratches to pseudo-registers to generate the correct code > satisfying the insn constraints and turn them back to scratches when the > corresponding pseudo-registers do not get hard registers. Moving change of > scratches to pseudo-regs from LRA to IRA could help but it is a big work. Yeah, sounds useful! Could it be done as a prepass before IRA proper? (Maybe not -- haven't really thought about it much.) > Another solution is to not use scratches in machine-descriptions and use > pseudo-registers instead. Scratches are bad and should be avoided as much > as possible. I expressed this several times. Besides it is not possible to > keep hard register preferences for them, they also can not be taken into > account in conflict graph and this results in overoptimistic assignments > which LRA has to correct. I can see that scratches make life harder for the register allocators, at least until the above change is made. But they're very useful for pre-RA passes. They make it possible for any pass to use recog to match a candidate instruction and have recog automatically add the required scratches, without the passes having to cope with new registers being created “spontaneously”. (Combine is of course the main example, but others like fwprop benefit too.) Although it would be possible to use pseudos instead, I'm not sure the end result would be better than what we have now: - Each pass would need to cope with recog creating new registers - We'd need some way of deferring the creation of pseudo registers for successful but speculative recogs that might later be rejected for cost reasons, or at least have some kind of recycling scheme. - Even then, we'll have more pseudos before RA, and so most of the DF datastructures will be larger and perhaps sparser.
[Bug target/95958] New: [meta-bug] Inefficient arm_neon.h code for AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 Bug ID: 95958 Summary: [meta-bug] Inefficient arm_neon.h code for AArch64 Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: meta-bug Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* This meta-bug is to collect AArch64 performance improvements that require or have a strong relationship with arm_neon.h intrinsics. This PR overlaps to some extent with PR47562, but that PR is more general in two respects: it covers general arm_neon.h improvements, and it covers both AArch32 and AArch64.
[Bug target/95958] [meta-bug] Inefficient arm_neon.h code for AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 rsandifo at gcc dot gnu.org changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-06-29 Status|UNCONFIRMED |NEW
[Bug target/95958] [meta-bug] Inefficient arm_neon.h code for AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 rsandifo at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Depends on||66675, 80283, 88212, 89057, ||89606, 89967, 91753, 94442, ||95265, 91598, 82074 --- Comment #1 from rsandifo at gcc dot gnu.org --- At the time of writing, PR80283 doesn't have an AArch64 testcase, only an AArch32 one. However, the underlying issue applies across targets. The content probably overlaps a lot with PR91598, which is AArch64-specific. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66675 [Bug 66675] Could improve vector lane folding style operations. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 [Bug 80283] [8/9/10/11 Regression] bad SIMD register allocation https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82074 [Bug 82074] [aarch64] vmlsq_f32 compiled into 2 instructions https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88212 [Bug 88212] IRA Register Coalescing not working for the testcase https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057 [Bug 89057] [8/9/10/11 Regression] AArch64 ld3 st4 less optimized https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89606 [Bug 89606] Extra mov after structure load instructions on aarch64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967 [Bug 89967] Inefficient code generation for vld2q_lane_u8 under aarch64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598 [Bug 91598] [8/9 regression] 60% speed drop on neon intrinsic loop https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91753 [Bug 91753] Bad register allocation of multi-register types https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94442 [Bug 94442] [10/11 regression] Redundant loads/stores emitted at -O3 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265 [Bug 95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
[Bug target/95962] New: Inefficient code for simple arm_neon.h iota operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95962 Bug ID: 95962 Summary: Inefficient code for simple arm_neon.h iota operation Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* For: #include int32x4_t foo (void) { int32_t array[] = { 0, 1, 2, 3 }; return vld1q_s32 (array); } we produce: foo: .LFB4217: .cfi_startproc sub sp, sp, #16 .cfi_def_cfa_offset 16 mov x0, 2 mov x1, 4294967296 movkx0, 0x3, lsl 32 stp x1, x0, [sp] ldr q0, [sp] add sp, sp, 16 .cfi_def_cfa_offset 0 ret In contrast, clang produces essentially perfect code: adrpx8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] ret I think the problem is a combination of two things: - __builtin_aarch64_ld1v4si & co. are treated as general functions rather than pure functions, so in principle it could write to the given address. This stops us promoting the array to a constant. - The loads could be reduced to native gimple-level operations, at least on little-endian targets. IMO this a bug rather than an enhancement. Intrinsics only exist to optimise code, and what GCC is doing falls short of what users should reasonably expect. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64
[Bug target/95964] New: AArch64 arm_neon.h arithmetic functions lack appropriate attributes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964 Bug ID: 95964 Summary: AArch64 arm_neon.h arithmetic functions lack appropriate attributes Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* For: --- #include #include std::vector a, b, c; void foo (size_t n) { for (size_t i = 0; i < n; ++i) a[i] = vfmaq_f32(a[i], b[i], c[i]); } --- we generate code that loads the start of a, b and c in every iteration of the loop: --- .cfi_startproc cbz x0, .L4 adrpx3, .LANCHOR0 add x3, x3, :lo12:.LANCHOR0 mov x2, 0 .p2align 3,,7 .L6: ldr x4, [x3] lsl x1, x2, 4 ldr x6, [x3, 24] add x2, x2, 1 ldr x5, [x3, 48] ldr q0, [x4, x1] ldr q2, [x6, x1] ldr q1, [x5, x1] fmlav0.4s, v2.4s, v1.4s str q0, [x4, x1] cmp x0, x2 bne .L6 .L4: ret .cfi_endproc --- The problem is that __builtin_aarch64_fmav4sf and similar operations are treated as general functions that can read memory, write memory, and call other functions. If the intrinsic is replaced by arithmetic then the start addresses are hoisted, as expected. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64
[Bug target/95964] AArch64 arm_neon.h arithmetic functions lack appropriate attributes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964 --- Comment #2 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #1) > You could use fnspec attributes to improve things but of course open-coding > those as GIMPLE is preferable (last resort is to "fold" the calls to > GIMPLE sequences as powerpc does for select builtins). Yeah, open-coding is another option, if the operation and command-line options are right. Part of the problem though is that the intrinsics are supposed to behave like the associated instructions, including in terms of honouring the rounding mode, etc. They're also not supposed to inherit C's idea of what's undefined behaviour for the closest “equivalent” operators or libm functions. So yeah, I think adding attributes is the way to go in general.
[Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967 Bug ID: 95967 Summary: Poor aarch64 vector constructor code when using arm_neon.h Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Depends on: 95962 Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* We generate poor code for the attached functions: f1: moviv4.4s, 0 ins v4.s[0], v0.s[0] ins v4.s[1], v1.s[0] ins v4.s[2], v2.s[0] mov v0.16b, v4.16b ins v0.s[3], v3.s[0] ret f2: dup v0.4s, v0.s[0] ins v0.s[1], v1.s[0] ins v0.s[2], v2.s[0] ins v0.s[3], v3.s[0] ret f3: sub sp, sp, #16 stp s0, s1, [sp] stp s2, s3, [sp, 8] ldr q0, [sp] add sp, sp, 16 ret g1: moviv0.4s, 0 ld1 {v0.s}[0], [x0] ld1 {v0.s}[1], [x1] ld1 {v0.s}[2], [x2] ld1 {v0.s}[3], [x3] ret g2: ld1r{v0.4s}, [x0] ld1 {v0.s}[1], [x1] ld1 {v0.s}[2], [x2] ld1 {v0.s}[3], [x3] ret g3: sub sp, sp, #16 ldr s0, [x3] ldr s3, [x0] ldr s2, [x1] ldr s1, [x2] stp s3, s2, [sp] stp s1, s0, [sp, 8] ldr q0, [sp] add sp, sp, 16 ret All three f functions should generate: mov v0.s[1], v1.s[0] mov v0.s[2], v2.s[0] mov v0.s[3], v3.s[0] ret and all three g functions should generate: ldr s0, [x0] ld1 { v0.s }[1], [x1] ld1 { v0.s }[2], [x2] ld1 { v0.s }[3], [x3] ret which is what current Clang does. Getting the right code for f3 and g3 depends on the fix for PR95962. There's a reasonable chance that PR95962 will be enough on its own to fix f3 and g3, but I included them just in case it isn't. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95962 [Bug 95962] Inefficient code for simple arm_neon.h iota operation
[Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967 --- Comment #1 from rsandifo at gcc dot gnu.org --- Created attachment 48802 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48802&action=edit 6 constructor functions This time with attachment -- not sure what happened last time.
[Bug target/95969] New: Use of __builtin_aarch64_im_lane_boundsi in AArch64 arm_neon.h interferes with gimple optimisation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95969 Bug ID: 95969 Summary: Use of __builtin_aarch64_im_lane_boundsi in AArch64 arm_neon.h interferes with gimple optimisation Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* For: #include void f (float32x4_t **ptr) { float32x4_t res = vsetq_lane_f32 (0.0f, **ptr, 0); **ptr = res; } the final gimple .optimized dump looks like: float32x4_t __vec; float32x4_t * _1; __Float32x4_t _2; float32x4_t * _3; [local count: 1073741824]: _1 = *ptr_5(D); _2 = *_1; __builtin_aarch64_im_lane_boundsi (16, 4, 0); __vec_8 = BIT_INSERT_EXPR <_2, 0.0, 0>; _3 = *ptr_5(D); *_3 = __vec_8; return; where we still have two loads from *ptr. This is because __builtin_aarch64_im_lane_boundsi has no attributes: it's modelled a general function that could do pretty much anything. Although we fix this testcase in the RTL optimisers, it's easy for the issue to cause unoptimised code in larger, more realistic testcases. The problem is similar to PR95964. The difficulty is that here we have to model the function as having some kind of side-effect, otherwise it will simply get optimised away. Ideally we'd fix this by implementing the intrinsics directly in the compiler and doing the checks in the frontend via TARGET_CHECK_BUILTIN_CALL. That's obviously a big change though. Until then, we should optimise away calls whose arguments are already correct so that they don't clog up the IL. If not returning a value makes it harder to fold the call for some reason, perhaps an alternative would be to pass a vector value through a dummy const function, e.g.: _4 = __builtin_aarch64_... (_2, 16, 4, 0); __vec_8 = BIT_INSERT_EXPR <_4, 0.0, 0>; That might not be necessary though -- haven't checked. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64
[Bug target/95974] New: AArch64 arm_neon.h stores interfere with gimple optimisations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95974 Bug ID: 95974 Summary: AArch64 arm_neon.h stores interfere with gimple optimisations Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* For: --- #include #include std::vector a; void f (size_t n, float32x4_t v) { for (size_t i = 0; i < n; i += 4) vst1q_f32 (&a[i], v); } --- we generate code that loads the start address of "a" in every iteration of the loop: --- cbz x0, .L4 adrpx4, .LANCHOR0 add x4, x4, :lo12:.LANCHOR0 mov x1, 0 .p2align 3,,7 .L6: ldr x3, [x4] lsl x2, x1, 2 add x1, x1, 4 str q0, [x3, x2] cmp x0, x1 bhi .L6 .L4: ret --- This is really the store equivalent of PR95962. The problem is that __builtin_aarch64_st1v4sf is modelled as a general function that could read and write from arbitrary memory. As with PR95962, one option would be to lower to gimple accesses where possible, at least for little-endian targets. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64
[Bug c++/92789] Non-obvious ?: behaviour with structurally equivalent types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92789 --- Comment #5 from rsandifo at gcc dot gnu.org --- Keeping open since the problem also affects arm*-*-*. I don't think we should change GCC 10 and earlier though.
[Bug c++/95726] ICE with aarch64 __Float32x4_t as template argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95726 rsandifo at gcc dot gnu.org changed: What|Removed |Added Target|aarch64-linux |aarch64*-*-* arm*-*-* --- Comment #10 from rsandifo at gcc dot gnu.org --- Fixed on trunk for aarch64*-*-* only. Still needs fixing for arm*-*-* on trunk, and for both targets on branches.
[Bug tree-optimization/95961] ICE: in exact_div, at poly-int.h:2182
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95961 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/96075] [8/9/10/11 Regression] bogus alignment for negative step grouped access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96075 --- Comment #4 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #3) > So we end up calling get_negative_load_store_type for this group which seems > to only handle contiguous accesses but this one is single element > interleaving > aka contiguous with gap. > > vect_supportable_dr_alignment returns dr_aligned which is seemingly OK for > > #(Data Ref: > # bb: 3 > # stmt: _4 = y[_2]; > # ref: y[_2]; > # base_object: y; > # Access function 0: {1022, +, -2}_1 Are you sure we support this? I think… > /* If this is a backward running DR then first access in the larger > vectype actually is N-1 elements before the address in the DR. > Adjust misalign accordingly. */ > if (tree_int_cst_sgn (step) < 0) > { > tree offset = ssize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1); > /* DR_STEP(dr) is the same as -TYPE_SIZE of the scalar type, > otherwise we wouldn't be here. */ …really was the assumption for negative steps at one time, and I'm not sure off-hand when/if that changed. (Of course, it might be that one of my patches changed it.)
[Bug tree-optimization/96091] ICE during dom: tree check: expected integer_cst, have poly_int_cst in to_wide, at tree.h:5911
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96091 --- Comment #3 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #2) > Why should we not have a VECTOR_CST of POLY_INT_CST elements? If > POLY_INT_CST > is not "constant" then it shouldn't be tcc_constant? Looks like > > tree > vector_cst_elt (const_tree t, unsigned int i) > { > ... > /* Otherwise work out the value from the last two encoded elements. */ > return wide_int_to_tree (TREE_TYPE (TREE_TYPE (t)), >vector_cst_int_elt (t, i)); > } > > should be using poly-ints and not wide-ints. Richard? Yeah, looks like it. I think I was worried about cases in which we could end up with poly_int*poly_int, from a poly_int-long vector containing a poly_int “stepped” vector constant. But that's not a problem here, since the index is always a plain integer.
[Bug target/95105] Bogus reference-compatibility error for arm_sve_vector_bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95105 --- Comment #3 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug target/95105] Bogus reference-compatibility error for arm_sve_vector_bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95105 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from rsandifo at gcc dot gnu.org --- …and closed.