subreg vs vec_select
Hi! I have a vector pseudo containing a single 128-bit value (V1TFmode) and I need to access its last 64 bits (DFmode). Which of the two options is better? (subreg:DF (reg:V1TF) 8) or (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int 1)])) If I use the first one, I run into a problem with set_noop_p (): it thinks that (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8)) is a no-op, because it doesn't check the mode after stripping the subreg: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 However this is not correct, because SET_DEST is the second register in a register pair, and SET_SRC is half of a vector register that overlaps the first register in the corresponding pair. So it looks as if mode needs to be considered there. This helps: --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) return 0; src = SUBREG_REG (src); dst = SUBREG_REG (dst); + if (GET_MODE (src) != GET_MODE (dst)) + return 0; } but I'm not sure whether I'm not missing something about subreg semantics in the first place. Best regards, Ilya
Re: subreg vs vec_select
On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote: > Hi Ilya, > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc > wrote: > > I have a vector pseudo containing a single 128-bit value (V1TFmode) > > and > > I need to access its last 64 bits (DFmode). Which of the two > > options > > is better? > > > > (subreg:DF (reg:V1TF) 8) > > > > or > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int > > 1)])) > > > > If I use the first one, I run into a problem with set_noop_p (): it > > thinks that > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8)) > > > > is a no-op, because it doesn't check the mode after stripping the > > subreg: > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 > > > > However this is not correct, because SET_DEST is the second > > register in > > a register pair, and SET_SRC is half of a vector register that > > overlaps > > the first register in the corresponding pair. So it looks as if > > mode > > needs to be considered there. > > Yes. > > > This helps: > > > > --- a/gcc/rtlanal.c > > +++ b/gcc/rtlanal.c > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) > > return 0; > >src = SUBREG_REG (src); > >dst = SUBREG_REG (dst); > > + if (GET_MODE (src) != GET_MODE (dst)) > > + return 0; > > } > > > > but I'm not sure whether I'm not missing something about subreg > > semantics in the first place. > > You probably should just see if both modes are the same number of > hard > registers? HARD_REGNO_NREGS. I've refined my patch as follows: --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set) return 0; src = SUBREG_REG (src); dst = SUBREG_REG (dst); + if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst) + && HARD_REGISTER_P (dst) + && hard_regno_nregs (REGNO (src), GET_MODE (src)) +!= hard_regno_nregs (REGNO (dst), GET_MODE (dst))) + return 0; } This also helps, and is less restrictive than my first variant. Two questions, just for my understanding: 1) This mode confusion problem must never happen to pseudos, because, unlike hard registers, pseudos must be always referred to in their natural mode. Is this correct? 2) Can there be a hypothetical machine, where modes XF and YF refer to 64-bit and 128-bit register pairs respectively? This would cause the mode confusion problem again. Is there anything in RTL semantics that would prohibit existence of such modes? Best regards, Ilya
Re: subreg vs vec_select
On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote: > Ilya Leoshkevich via Gcc writes: > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote: > > > Hi Ilya, > > > > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via > > > Gcc > > > wrote: > > > > I have a vector pseudo containing a single 128-bit value > > > > (V1TFmode) > > > > and > > > > I need to access its last 64 bits (DFmode). Which of the two > > > > options > > > > is better? > > > > > > > > (subreg:DF (reg:V1TF) 8) > > > > > > > > or > > > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int > > > > 1)])) > > > > > > > > If I use the first one, I run into a problem with set_noop_p > > > > (): it > > > > thinks that > > > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8)) > > > > > > > > is a no-op, because it doesn't check the mode after stripping > > > > the > > > > subreg: > > > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 > > > > > > > > However this is not correct, because SET_DEST is the second > > > > register in > > > > a register pair, and SET_SRC is half of a vector register that > > > > overlaps > > > > the first register in the corresponding pair. So it looks as if > > > > mode > > > > needs to be considered there. > > > > > > Yes. > > > > > > > This helps: > > > > > > > > --- a/gcc/rtlanal.c > > > > +++ b/gcc/rtlanal.c > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) > > > > return 0; > > > >src = SUBREG_REG (src); > > > >dst = SUBREG_REG (dst); > > > > + if (GET_MODE (src) != GET_MODE (dst)) > > > > + return 0; > > > > } > > > > > > > > but I'm not sure whether I'm not missing something about subreg > > > > semantics in the first place. > > > > > > You probably should just see if both modes are the same number of > > > hard > > > registers? HARD_REGNO_NREGS. > > > > I've refined my patch as follows: > > > > --- a/gcc/rtlanal.c > > +++ b/gcc/rtlanal.c > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set) > > return 0; > >src = SUBREG_REG (src); > >dst = SUBREG_REG (dst); > > + if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst) > > + && HARD_REGISTER_P (dst) > > + && hard_regno_nregs (REGNO (src), GET_MODE (src)) > > +!= hard_regno_nregs (REGNO (dst), GET_MODE (dst))) > > + return 0; > > } > > I think checking the mode would be safer. Having the same number > of registers doesn't mean that the bits are distributed across the > registers in the same way. Yeah, that's what I was trying to express with this hypothetical machine example. On the other hand, checking mode is too pessimistic. E.g. if we talk about s390 GPRs, then considering (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4)) a no-op is fine from my perspective. So having a more restrictive check might be desirable. Is there a way to ask the backend how the subreg bits are distributed? > Out of interest, why can't the subregs in the example above get > folded down to hard registers? I think this is because the offsets are not 0. I could imagine folding (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a backend hook for this. Does anything like this exist? Also, can (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply the second doubleword of 128-bit %v0 vector register.
Re: subreg vs vec_select
On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote: > On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote: > > Ilya Leoshkevich via Gcc writes: > > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote: > > > > Hi Ilya, > > > > > > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via > > > > Gcc > > > > wrote: > > > > > I have a vector pseudo containing a single 128-bit value > > > > > (V1TFmode) > > > > > and > > > > > I need to access its last 64 bits (DFmode). Which of the two > > > > > options > > > > > is better? > > > > > > > > > > (subreg:DF (reg:V1TF) 8) > > > > > > > > > > or > > > > > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel > > > > > [(const_int > > > > > 1)])) > > > > > > > > > > If I use the first one, I run into a problem with set_noop_p > > > > > (): it > > > > > thinks that > > > > > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8)) > > > > > > > > > > is a no-op, because it doesn't check the mode after stripping > > > > > the > > > > > subreg: > > > > > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 > > > > > > > > > > However this is not correct, because SET_DEST is the second > > > > > register in > > > > > a register pair, and SET_SRC is half of a vector register > > > > > that > > > > > overlaps > > > > > the first register in the corresponding pair. So it looks as > > > > > if > > > > > mode > > > > > needs to be considered there. > > > > > > > > Yes. > > > > > > > > > This helps: > > > > > > > > > > --- a/gcc/rtlanal.c > > > > > +++ b/gcc/rtlanal.c > > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) > > > > > return 0; > > > > >src = SUBREG_REG (src); > > > > >dst = SUBREG_REG (dst); > > > > > + if (GET_MODE (src) != GET_MODE (dst)) > > > > > + return 0; > > > > > } > > > > > > > > > > but I'm not sure whether I'm not missing something about > > > > > subreg > > > > > semantics in the first place. > > > > > > > > You probably should just see if both modes are the same number > > > > of > > > > hard > > > > registers? HARD_REGNO_NREGS. > > > > > > I've refined my patch as follows: > > > > > > --- a/gcc/rtlanal.c > > > +++ b/gcc/rtlanal.c > > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set) > > > return 0; > > >src = SUBREG_REG (src); > > >dst = SUBREG_REG (dst); > > > + if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst) > > > + && HARD_REGISTER_P (dst) > > > + && hard_regno_nregs (REGNO (src), GET_MODE (src)) > > > +!= hard_regno_nregs (REGNO (dst), GET_MODE > > > (dst))) > > > + return 0; > > > } > > > > I think checking the mode would be safer. Having the same number > > of registers doesn't mean that the bits are distributed across the > > registers in the same way. > > Yeah, that's what I was trying to express with this hypothetical > machine example. On the other hand, checking mode is too > pessimistic. > E.g. if we talk about s390 GPRs, then considering > > (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4)) Sorry, bad example: here the hard register modes actually match. But it's probably possible to come up with something similar, where the hard reg is accessed with different modes, but when we add subregs on top, then we end up selecting the same bits. > a no-op is fine from my perspective. So having a more restrictive > check might be desirable. Is there a way to ask the backend how the > subreg bits are distributed? > > > Out of interest, why can't the subregs in the example above get > > folded down to hard registers? > > I think this is because the offsets are not 0. I could imagine > folding > (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a > backend hook for this. Does anything like this exist? Also, can > (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply > the second doubleword of 128-bit %v0 vector register.
Re: subreg vs vec_select
On Fri, 2020-09-11 at 12:14 +0100, Richard Sandiford wrote: > Ilya Leoshkevich writes: > > On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote: > > > On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote: > > > > Ilya Leoshkevich via Gcc writes: > > > > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote: > > > > > > Hi Ilya, > > > > > > > > > > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich > > > > > > via > > > > > > Gcc > > > > > > wrote: > > > > > > > I have a vector pseudo containing a single 128-bit value > > > > > > > (V1TFmode) > > > > > > > and > > > > > > > I need to access its last 64 bits (DFmode). Which of the > > > > > > > two > > > > > > > options > > > > > > > is better? > > > > > > > > > > > > > > (subreg:DF (reg:V1TF) 8) > > > > > > > > > > > > > > or > > > > > > > > > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel > > > > > > > [(const_int > > > > > > > 1)])) > > > > > > > > > > > > > > If I use the first one, I run into a problem with > > > > > > > set_noop_p > > > > > > > (): it > > > > > > > thinks that > > > > > > > > > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) > > > > > > > 8)) > > > > > > > > > > > > > > is a no-op, because it doesn't check the mode after > > > > > > > stripping > > > > > > > the > > > > > > > subreg: > > > > > > > > > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 > > > > > > > > > > > > > > However this is not correct, because SET_DEST is the > > > > > > > second > > > > > > > register in > > > > > > > a register pair, and SET_SRC is half of a vector register > > > > > > > that > > > > > > > overlaps > > > > > > > the first register in the corresponding pair. So it looks > > > > > > > as > > > > > > > if > > > > > > > mode > > > > > > > needs to be considered there. > > > > > > > > > > > > Yes. > > > > > > > > > > > > > This helps: > > > > > > > > > > > > > > --- a/gcc/rtlanal.c > > > > > > > +++ b/gcc/rtlanal.c > > > > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) > > > > > > > return 0; > > > > > > >src = SUBREG_REG (src); > > > > > > >dst = SUBREG_REG (dst); > > > > > > > + if (GET_MODE (src) != GET_MODE (dst)) > > > > > > > + return 0; > > > > > > > } > > > > > > > > > > > > > > but I'm not sure whether I'm not missing something about > > > > > > > subreg > > > > > > > semantics in the first place. > > > > > > > > > > > > You probably should just see if both modes are the same > > > > > > number > > > > > > of > > > > > > hard > > > > > > registers? HARD_REGNO_NREGS. > > > > > > > > > > I've refined my patch as follows: > > > > > > > > > > --- a/gcc/rtlanal.c > > > > > +++ b/gcc/rtlanal.c > > > > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set) > > > > > return 0; > > > > >src = SUBREG_REG (src); > > > > >dst = SUBREG_REG (dst); > > > > > + if (REG_P (src) && HARD_REGISTER_P (src) && REG_P > > > > > (dst) > > > > > + && HARD_REGISTER_P (dst) > > > > > + && hard_regno_nregs (REGNO (src), GET_MODE (src)) > > > > > +!= hard_regno_nregs (REGNO (dst), GET_MODE > > > >
New pseudos in splitters
Hi, "Defining How to Split Instructions" in gccint states the following: The preparation-statements are similar to those statements that are specified for define_expand ... Unlike those in define_expand, however, these statements must not generate any new pseudo-registers. I see that there is code that does this anyway, e.g. "Split calculate_pic_address into pic_load_addr_* and a move." in arm.md. Is this restriction still valid today? Is there a reason we can't introduce new pseudos in a splitter before LRA? Best regards, Ilya
Incremental updating of SSA_NAMEs that are passed to b_c_p
Hi, I'd like to revive the old discussion regarding the interaction of jump threading and b_c_p causing the latter to incorrectly return 1 in certain cases: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html The conclusion was that this happening during threading is just a symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p should not be registered for incremental updating. I performed a little experiment and added an assertion to create_new_def_for: --- a/gcc/tree-into-ssa.c +++ b/gcc/tree-into-ssa.c @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple *stmt, def_operand_p def) { tree new_name; + gcc_checking_assert (!used_by_bcp_p (old_name)); + timevar_push (TV_TREE_SSA_INCREMENTAL); if (!update_ssa_initialized_fn) This has of course fired when performing basic block duplication during threading, which can be fixed by avoiding duplication of basic blocks wi th b_c_p calls: --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p (const_basic_block bb) || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT) || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY) || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY) - || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX))) + || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX) + || gimple_call_builtin_p (g, BUILT_IN_CONSTANT_P))) return false; } The second occurrence is a bit more interesting: gimple * vrp_insert::build_assert_expr_for (tree cond, tree v) { ... a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond); assertion = gimple_build_assign (NULL_TREE, a); ... tree new_def = create_new_def_for (v, assertion, NULL); The fix is also simple though: --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for (tree name, assert_locus *loc) if (loc->expr == loc->val) return false; + if (used_by_bcp_p (name)) +return false; + cond = build2 (loc->comp_code, boolean_type_node, loc->expr, loc- >val); assert_stmt = build_assert_expr_for (cond, name); if (loc->e) My original testcase did not trigger anything else. I'm planning to check how this affects the testsuite, but before going further I would like to ask: is this the right direction now? To me it looks a little-bit more heavy-handed than the original approach, but maybe it's worth it. Best regards, Ilya
Re: Incremental updating of SSA_NAMEs that are passed to b_c_p
On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote: > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc > wrote: > > Hi, > > > > I'd like to revive the old discussion regarding the interaction of > > jump threading and b_c_p causing the latter to incorrectly return 1 > > in > > certain cases: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html > > > > The conclusion was that this happening during threading is just a > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p > > should > > not be registered for incremental updating. > > > > I performed a little experiment and added an assertion to > > create_new_def_for: > > > > --- a/gcc/tree-into-ssa.c > > +++ b/gcc/tree-into-ssa.c > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple > > *stmt, > > def_operand_p def) > > { > >tree new_name; > > > > + gcc_checking_assert (!used_by_bcp_p (old_name)); > > + > >timevar_push (TV_TREE_SSA_INCREMENTAL); > > > >if (!update_ssa_initialized_fn) > > > > This has of course fired when performing basic block duplication > > during > > threading, which can be fixed by avoiding duplication of basic > > blocks > > wi > > th b_c_p calls: > > > > --- a/gcc/tree-cfg.c > > +++ b/gcc/tree-cfg.c > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p (const_basic_block > > bb) > > || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT) > > || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY) > > || gimple_call_internal_p (g, > > IFN_GOMP_SIMT_XCHG_BFLY) > > - || gimple_call_internal_p (g, > > IFN_GOMP_SIMT_XCHG_IDX))) > > + || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX) > > + || gimple_call_builtin_p (g, BUILT_IN_CONSTANT_P))) > > return false; > > } > > > > The second occurrence is a bit more interesting: > > > > gimple * > > vrp_insert::build_assert_expr_for (tree cond, tree v) > > { > > ... > > a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond); > > assertion = gimple_build_assign (NULL_TREE, a); > > ... > > tree new_def = create_new_def_for (v, assertion, NULL); > > > > The fix is also simple though: > > > > --- a/gcc/tree-vrp.c > > +++ b/gcc/tree-vrp.c > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for > > (tree > > name, assert_locus *loc) > >if (loc->expr == loc->val) > > return false; > > > > + if (used_by_bcp_p (name)) > > +return false; > > + > >cond = build2 (loc->comp_code, boolean_type_node, loc->expr, > > loc- > > > val); > >assert_stmt = build_assert_expr_for (cond, name); > >if (loc->e) > > > > My original testcase did not trigger anything else. I'm planning > > to > > check how this affects the testsuite, but before going further I > > would > > like to ask: is this the right direction now? To me it looks a > > little-bit more heavy-handed than the original approach, but maybe > > it's > > worth it. > > Disabling threading looks reasonable but I'd rather not disallow BB > duplication > or renaming. For VRP I guess we want to instead change > register_edge_assert_for* to not register assertions for the result > of > __builtin_constant_p rather than just not allowing VRP to process it > (there are other consumers still). If I understood Jeff correctly, we should disable incremental updates for absolutely all b_c_p arguments, so affecting as many consumers as possible must actually be a good thing when this approach is considered? That said, regtest has revealed one more place where this is happening: rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi -> create_new_def_for. The reduced code is: int a; void b (void) { while (a) __builtin_constant_p (a) ?: 0; if (a == 8) while (1) ; } I guess we are not allowed to cancel this transformation, becase we really need LCSSA for later passes? This now contradicts the "prohibit all updates" idea.. If you think that disabling threading is reasonable, could you please have another look at the original patches? v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html v2: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549211.html Can anything like this go in after all? Best regards, Ilya
Re: Incremental updating of SSA_NAMEs that are passed to b_c_p
On Fri, 2020-10-30 at 09:22 +0100, Richard Biener wrote: > On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich > wrote: > > On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote: > > > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc > > > wrote: > > > > Hi, > > > > > > > > I'd like to revive the old discussion regarding the interaction > > > > of > > > > jump threading and b_c_p causing the latter to incorrectly > > > > return 1 > > > > in > > > > certain cases: > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html > > > > > > > > The conclusion was that this happening during threading is just > > > > a > > > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p > > > > should > > > > not be registered for incremental updating. > > > > > > > > I performed a little experiment and added an assertion to > > > > create_new_def_for: > > > > > > > > --- a/gcc/tree-into-ssa.c > > > > +++ b/gcc/tree-into-ssa.c > > > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple > > > > *stmt, > > > > def_operand_p def) > > > > { > > > >tree new_name; > > > > > > > > + gcc_checking_assert (!used_by_bcp_p (old_name)); > > > > + > > > >timevar_push (TV_TREE_SSA_INCREMENTAL); > > > > > > > >if (!update_ssa_initialized_fn) > > > > > > > > This has of course fired when performing basic block > > > > duplication > > > > during > > > > threading, which can be fixed by avoiding duplication of basic > > > > blocks > > > > wi > > > > th b_c_p calls: > > > > > > > > --- a/gcc/tree-cfg.c > > > > +++ b/gcc/tree-cfg.c > > > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p > > > > (const_basic_block > > > > bb) > > > > || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT) > > > > || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_VOTE_ANY) > > > > || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_BFLY) > > > > - || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_IDX))) > > > > + || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_IDX) > > > > + || gimple_call_builtin_p (g, > > > > BUILT_IN_CONSTANT_P))) > > > > return false; > > > > } > > > > > > > > The second occurrence is a bit more interesting: > > > > > > > > gimple * > > > > vrp_insert::build_assert_expr_for (tree cond, tree v) > > > > { > > > > ... > > > > a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond); > > > > assertion = gimple_build_assign (NULL_TREE, a); > > > > ... > > > > tree new_def = create_new_def_for (v, assertion, NULL); > > > > > > > > The fix is also simple though: > > > > > > > > --- a/gcc/tree-vrp.c > > > > +++ b/gcc/tree-vrp.c > > > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for > > > > (tree > > > > name, assert_locus *loc) > > > >if (loc->expr == loc->val) > > > > return false; > > > > > > > > + if (used_by_bcp_p (name)) > > > > +return false; > > > > + > > > >cond = build2 (loc->comp_code, boolean_type_node, loc->expr, > > > > loc- > > > > > val); > > > >assert_stmt = build_assert_expr_for (cond, name); > > > >if (loc->e) > > > > > > > > My original testcase did not trigger anything else. I'm > > > > planning > > > > to > > > > check how this affects the testsuite, but before going further > > > > I > > > > would > > > > like to ask: is this the right direction now? To me it looks a > > > > little-bit more heavy-handed than the original approach, but > > > > maybe > > > > it's > > > > worth it. > > > > > > Disabling threading looks reasonable but
Reassociation and trapping operations
Hi! I have a C floating point comparison (a <= b && a >= b), which test_for_singularity turns into (a <= b && a == b) and vectorizer turns into ((a <= b) & (a == b)). So far so good. eliminate_redundant_comparison, however, turns it into just (a == b). I don't think this is correct, because (a <= b) traps and (a == b) doesn't. The following helps: --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -2144,7 +2144,7 @@ eliminate_redundant_comparison (enum tree_code opcode, if (TREE_CODE (curr->op) != SSA_NAME) return false; def1 = SSA_NAME_DEF_STMT (curr->op); - if (!is_gimple_assign (def1)) + if (!is_gimple_assign (def1) || gimple_could_trap_p (def1)) return false; lcode = gimple_assign_rhs_code (def1); if (TREE_CODE_CLASS (lcode) != tcc_comparison) @@ -2160,7 +2160,7 @@ eliminate_redundant_comparison (enum tree_code opcode, if (TREE_CODE (oe->op) != SSA_NAME) continue; def2 = SSA_NAME_DEF_STMT (oe->op); - if (!is_gimple_assign (def2)) + if (!is_gimple_assign (def2) || gimple_could_trap_p (def2)) continue; rcode = gimple_assign_rhs_code (def2); if (TREE_CODE_CLASS (rcode) != tcc_comparison) However, I couldn't help noticing that other parts of reassociation pass use stmt_could_throw_p, and trapping is mentioned only in one place. Is this intentional? Shouldn't both throwing and trapping block reassociation? Best regards, Ilya
Re: Reassociation and trapping operations
On Wed, 2020-11-25 at 10:53 +0100, Richard Biener wrote: > On Wed, Nov 25, 2020 at 8:15 AM Marc Glisse > wrote: > > On Wed, 25 Nov 2020, Ilya Leoshkevich via Gcc wrote: > > > > > I have a C floating point comparison (a <= b && a >= b), which > > > test_for_singularity turns into (a <= b && a == b) and vectorizer > > > turns > > > into ((a <= b) & (a == b)). So far so good. > > > > > > eliminate_redundant_comparison, however, turns it into just (a == > > > b). > > > I don't think this is correct, because (a <= b) traps and (a == > > > b) > > > doesn't. > > > > Hello, > > > > let me just mention the old > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53805 > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53806 > > > > There has been some debate about the exact meaning of -ftrapping- > > math, but > > don't let that stop you. > > My interpretation has been that GCC considers traps not observable > unless you compile with -fnon-call-exceptions which means that GCC > happily elides them. That's usually in-line of user expectations > with > respect to optimization - they do not expect us to do less > optimization > just for the sake if there's a trap. Of course we do have to be > careful > to not introduce traps where there were none. > > In particular for say > > a <= b; > foo (); > > you cannot rely on foo () never being called when a <= b traps > because > its effect on control flow is not modeled in the IL (we also happily > DCE any such possibly trapping operation - the traps are not > considered > unmodeled side-effects). > Even with -fnon-call-exceptions when the possible exception is not > caught within > the function there are probably similar issues with respect to code > motion. > > Richard. > > > -- > > Marc Glisse Thanks for the explanation, that's good to know. I'll need to rather ad just my test expectations then. Best regards, Ilya