subreg vs vec_select

2020-09-09 Thread Ilya Leoshkevich via Gcc
Hi!

I have a vector pseudo containing a single 128-bit value (V1TFmode) and
I need to access its last 64 bits (DFmode). Which of the two options
is better?

(subreg:DF (reg:V1TF) 8)

or

(vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int 1)]))

If I use the first one, I run into a problem with set_noop_p (): it
thinks that

(set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))

is a no-op, because it doesn't check the mode after stripping the
subreg:

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616

However this is not correct, because SET_DEST is the second register in
a register pair, and SET_SRC is half of a vector register that overlaps
the first register in the corresponding pair. So it looks as if mode
needs to be considered there.

This helps:

--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
return 0;
   src = SUBREG_REG (src);
   dst = SUBREG_REG (dst);
+  if (GET_MODE (src) != GET_MODE (dst))
+   return 0;
 }

but I'm not sure whether I'm not missing something about subreg
semantics in the first place.

Best regards,
Ilya



Re: subreg vs vec_select

2020-09-10 Thread Ilya Leoshkevich via Gcc
On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> Hi Ilya,
> 
> On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc
> wrote:
> > I have a vector pseudo containing a single 128-bit value (V1TFmode)
> > and
> > I need to access its last 64 bits (DFmode). Which of the two
> > options
> > is better?
> > 
> > (subreg:DF (reg:V1TF) 8)
> > 
> > or
> > 
> > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
> > 1)]))
> > 
> > If I use the first one, I run into a problem with set_noop_p (): it
> > thinks that
> > 
> > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > 
> > is a no-op, because it doesn't check the mode after stripping the
> > subreg:
> > 
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > 
> > However this is not correct, because SET_DEST is the second
> > register in
> > a register pair, and SET_SRC is half of a vector register that
> > overlaps
> > the first register in the corresponding pair. So it looks as if
> > mode
> > needs to be considered there.
> 
> Yes.
> 
> > This helps:
> > 
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > return 0;
> >src = SUBREG_REG (src);
> >dst = SUBREG_REG (dst);
> > +  if (GET_MODE (src) != GET_MODE (dst))
> > +   return 0;
> >  }
> > 
> > but I'm not sure whether I'm not missing something about subreg
> > semantics in the first place.
> 
> You probably should just see if both modes are the same number of
> hard
> registers?  HARD_REGNO_NREGS.

I've refined my patch as follows:

--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
return 0;
   src = SUBREG_REG (src);
   dst = SUBREG_REG (dst);
+  if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
+ && HARD_REGISTER_P (dst)
+ && hard_regno_nregs (REGNO (src), GET_MODE (src))
+!= hard_regno_nregs (REGNO (dst), GET_MODE (dst)))
+   return 0;
 }
 
This also helps, and is less restrictive than my first variant.
Two questions, just for my understanding:

1) This mode confusion problem must never happen to pseudos, because,
   unlike hard registers, pseudos must be always referred to in their
   natural mode.  Is this correct?

2) Can there be a hypothetical machine, where modes XF and YF refer to
   64-bit and 128-bit register pairs respectively?  This would cause
   the mode confusion problem again.  Is there anything in RTL
   semantics that would prohibit existence of such modes?

Best regards,
Ilya



Re: subreg vs vec_select

2020-09-11 Thread Ilya Leoshkevich via Gcc
On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
> Ilya Leoshkevich via Gcc  writes:
> > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > > Hi Ilya,
> > > 
> > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via
> > > Gcc
> > > wrote:
> > > > I have a vector pseudo containing a single 128-bit value
> > > > (V1TFmode)
> > > > and
> > > > I need to access its last 64 bits (DFmode). Which of the two
> > > > options
> > > > is better?
> > > > 
> > > > (subreg:DF (reg:V1TF) 8)
> > > > 
> > > > or
> > > > 
> > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
> > > > 1)]))
> > > > 
> > > > If I use the first one, I run into a problem with set_noop_p
> > > > (): it
> > > > thinks that
> > > > 
> > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > > > 
> > > > is a no-op, because it doesn't check the mode after stripping
> > > > the
> > > > subreg:
> > > > 
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > > 
> > > > However this is not correct, because SET_DEST is the second
> > > > register in
> > > > a register pair, and SET_SRC is half of a vector register that
> > > > overlaps
> > > > the first register in the corresponding pair. So it looks as if
> > > > mode
> > > > needs to be considered there.
> > > 
> > > Yes.
> > > 
> > > > This helps:
> > > > 
> > > > --- a/gcc/rtlanal.c
> > > > +++ b/gcc/rtlanal.c
> > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > > return 0;
> > > >src = SUBREG_REG (src);
> > > >dst = SUBREG_REG (dst);
> > > > +  if (GET_MODE (src) != GET_MODE (dst))
> > > > +   return 0;
> > > >  }
> > > > 
> > > > but I'm not sure whether I'm not missing something about subreg
> > > > semantics in the first place.
> > > 
> > > You probably should just see if both modes are the same number of
> > > hard
> > > registers?  HARD_REGNO_NREGS.
> > 
> > I've refined my patch as follows:
> > 
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> > return 0;
> >src = SUBREG_REG (src);
> >dst = SUBREG_REG (dst);
> > +  if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
> > + && HARD_REGISTER_P (dst)
> > + && hard_regno_nregs (REGNO (src), GET_MODE (src))
> > +!= hard_regno_nregs (REGNO (dst), GET_MODE (dst)))
> > +   return 0;
> >  }
> 
> I think checking the mode would be safer.  Having the same number
> of registers doesn't mean that the bits are distributed across the
> registers in the same way.

Yeah, that's what I was trying to express with this hypothetical
machine example.  On the other hand, checking mode is too pessimistic.
E.g. if we talk about s390 GPRs, then considering

(set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))

a no-op is fine from my perspective.  So having a more restrictive
check might be desirable.  Is there a way to ask the backend how the
subreg bits are distributed?

> Out of interest, why can't the subregs in the example above get
> folded down to hard registers?

I think this is because the offsets are not 0.  I could imagine folding
(subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
backend hook for this.  Does anything like this exist?  Also, can
(subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
the second doubleword of 128-bit %v0 vector register.



Re: subreg vs vec_select

2020-09-11 Thread Ilya Leoshkevich via Gcc
On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote:
> On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
> > Ilya Leoshkevich via Gcc  writes:
> > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > > > Hi Ilya,
> > > > 
> > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via
> > > > Gcc
> > > > wrote:
> > > > > I have a vector pseudo containing a single 128-bit value
> > > > > (V1TFmode)
> > > > > and
> > > > > I need to access its last 64 bits (DFmode). Which of the two
> > > > > options
> > > > > is better?
> > > > > 
> > > > > (subreg:DF (reg:V1TF) 8)
> > > > > 
> > > > > or
> > > > > 
> > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel
> > > > > [(const_int
> > > > > 1)]))
> > > > > 
> > > > > If I use the first one, I run into a problem with set_noop_p
> > > > > (): it
> > > > > thinks that
> > > > > 
> > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > > > > 
> > > > > is a no-op, because it doesn't check the mode after stripping
> > > > > the
> > > > > subreg:
> > > > > 
> > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > > > 
> > > > > However this is not correct, because SET_DEST is the second
> > > > > register in
> > > > > a register pair, and SET_SRC is half of a vector register
> > > > > that
> > > > > overlaps
> > > > > the first register in the corresponding pair. So it looks as
> > > > > if
> > > > > mode
> > > > > needs to be considered there.
> > > > 
> > > > Yes.
> > > > 
> > > > > This helps:
> > > > > 
> > > > > --- a/gcc/rtlanal.c
> > > > > +++ b/gcc/rtlanal.c
> > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > > > return 0;
> > > > >src = SUBREG_REG (src);
> > > > >dst = SUBREG_REG (dst);
> > > > > +  if (GET_MODE (src) != GET_MODE (dst))
> > > > > +   return 0;
> > > > >  }
> > > > > 
> > > > > but I'm not sure whether I'm not missing something about
> > > > > subreg
> > > > > semantics in the first place.
> > > > 
> > > > You probably should just see if both modes are the same number
> > > > of
> > > > hard
> > > > registers?  HARD_REGNO_NREGS.
> > > 
> > > I've refined my patch as follows:
> > > 
> > > --- a/gcc/rtlanal.c
> > > +++ b/gcc/rtlanal.c
> > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> > > return 0;
> > >src = SUBREG_REG (src);
> > >dst = SUBREG_REG (dst);
> > > +  if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
> > > + && HARD_REGISTER_P (dst)
> > > + && hard_regno_nregs (REGNO (src), GET_MODE (src))
> > > +!= hard_regno_nregs (REGNO (dst), GET_MODE
> > > (dst)))
> > > +   return 0;
> > >  }
> > 
> > I think checking the mode would be safer.  Having the same number
> > of registers doesn't mean that the bits are distributed across the
> > registers in the same way.
> 
> Yeah, that's what I was trying to express with this hypothetical
> machine example.  On the other hand, checking mode is too
> pessimistic.
> E.g. if we talk about s390 GPRs, then considering
> 
> (set (subreg:SI (reg:DI %r0) 4) (subreg:SI (reg:DI %r0) 4))

Sorry, bad example: here the hard register modes actually match.
But it's probably possible to come up with something similar, where
the hard reg is accessed with different modes, but when we add subregs
on top, then we end up selecting the same bits.

> a no-op is fine from my perspective.  So having a more restrictive
> check might be desirable.  Is there a way to ask the backend how the
> subreg bits are distributed?
> 
> > Out of interest, why can't the subregs in the example above get
> > folded down to hard registers?
> 
> I think this is because the offsets are not 0.  I could imagine
> folding
> (subreg:DF (reg:TF %f0) 8) to (reg:DF %f2) - but there must be a
> backend hook for this.  Does anything like this exist?  Also, can
> (subreg:DF (reg:V1TF %f0) 8) be folded at all? This is simply
> the second doubleword of 128-bit %v0 vector register.



Re: subreg vs vec_select

2020-09-11 Thread Ilya Leoshkevich via Gcc
On Fri, 2020-09-11 at 12:14 +0100, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:
> > On Fri, 2020-09-11 at 12:17 +0200, Ilya Leoshkevich wrote:
> > > On Fri, 2020-09-11 at 10:46 +0100, Richard Sandiford wrote:
> > > > Ilya Leoshkevich via Gcc  writes:
> > > > > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > > > > > Hi Ilya,
> > > > > > 
> > > > > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich
> > > > > > via
> > > > > > Gcc
> > > > > > wrote:
> > > > > > > I have a vector pseudo containing a single 128-bit value
> > > > > > > (V1TFmode)
> > > > > > > and
> > > > > > > I need to access its last 64 bits (DFmode). Which of the
> > > > > > > two
> > > > > > > options
> > > > > > > is better?
> > > > > > > 
> > > > > > > (subreg:DF (reg:V1TF) 8)
> > > > > > > 
> > > > > > > or
> > > > > > > 
> > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel
> > > > > > > [(const_int
> > > > > > > 1)]))
> > > > > > > 
> > > > > > > If I use the first one, I run into a problem with
> > > > > > > set_noop_p
> > > > > > > (): it
> > > > > > > thinks that
> > > > > > > 
> > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0)
> > > > > > > 8))
> > > > > > > 
> > > > > > > is a no-op, because it doesn't check the mode after
> > > > > > > stripping
> > > > > > > the
> > > > > > > subreg:
> > > > > > > 
> > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > > > > > 
> > > > > > > However this is not correct, because SET_DEST is the
> > > > > > > second
> > > > > > > register in
> > > > > > > a register pair, and SET_SRC is half of a vector register
> > > > > > > that
> > > > > > > overlaps
> > > > > > > the first register in the corresponding pair. So it looks
> > > > > > > as
> > > > > > > if
> > > > > > > mode
> > > > > > > needs to be considered there.
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > This helps:
> > > > > > > 
> > > > > > > --- a/gcc/rtlanal.c
> > > > > > > +++ b/gcc/rtlanal.c
> > > > > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > > > > > return 0;
> > > > > > >src = SUBREG_REG (src);
> > > > > > >dst = SUBREG_REG (dst);
> > > > > > > +  if (GET_MODE (src) != GET_MODE (dst))
> > > > > > > +   return 0;
> > > > > > >  }
> > > > > > > 
> > > > > > > but I'm not sure whether I'm not missing something about
> > > > > > > subreg
> > > > > > > semantics in the first place.
> > > > > > 
> > > > > > You probably should just see if both modes are the same
> > > > > > number
> > > > > > of
> > > > > > hard
> > > > > > registers?  HARD_REGNO_NREGS.
> > > > > 
> > > > > I've refined my patch as follows:
> > > > > 
> > > > > --- a/gcc/rtlanal.c
> > > > > +++ b/gcc/rtlanal.c
> > > > > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> > > > > return 0;
> > > > >src = SUBREG_REG (src);
> > > > >dst = SUBREG_REG (dst);
> > > > > +  if (REG_P (src) && HARD_REGISTER_P (src) && REG_P
> > > > > (dst)
> > > > > + && HARD_REGISTER_P (dst)
> > > > > + && hard_regno_nregs (REGNO (src), GET_MODE (src))
> > > > > +!= hard_regno_nregs (REGNO (dst), GET_MODE
> > > >

New pseudos in splitters

2020-09-23 Thread Ilya Leoshkevich via Gcc
Hi,

"Defining How to Split Instructions" in gccint states the following:

The preparation-statements are similar to those statements that are
specified for define_expand ... Unlike those in define_expand, however,
these statements must not generate any new pseudo-registers.

I see that there is code that does this anyway, e.g. "Split
calculate_pic_address into pic_load_addr_* and a move." in arm.md.

Is this restriction still valid today?  Is there a reason we can't
introduce new pseudos in a splitter before LRA?

Best regards,
Ilya



Incremental updating of SSA_NAMEs that are passed to b_c_p

2020-10-27 Thread Ilya Leoshkevich via Gcc
Hi,

I'd like to revive the old discussion regarding the interaction of
jump threading and b_c_p causing the latter to incorrectly return 1 in
certain cases:

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html

The conclusion was that this happening during threading is just a
symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p should
not be registered for incremental updating.

I performed a little experiment and added an assertion to
create_new_def_for:

--- a/gcc/tree-into-ssa.c
+++ b/gcc/tree-into-ssa.c
@@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple *stmt,
def_operand_p def)
 {
   tree new_name;
 
+  gcc_checking_assert (!used_by_bcp_p (old_name));
+
   timevar_push (TV_TREE_SSA_INCREMENTAL);
 
   if (!update_ssa_initialized_fn)

This has of course fired when performing basic block duplication during
threading, which can be fixed by avoiding duplication of basic blocks
wi
th b_c_p calls:

--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)
+ || gimple_call_builtin_p (g, BUILT_IN_CONSTANT_P)))
return false;
 }

The second occurrence is a bit more interesting:

gimple *
vrp_insert::build_assert_expr_for (tree cond, tree v)
{
  ...
  a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
  assertion = gimple_build_assign (NULL_TREE, a);
  ...
  tree new_def = create_new_def_for (v, assertion, NULL);

The fix is also simple though:

--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for (tree
name, assert_locus *loc)
   if (loc->expr == loc->val)
 return false;
 
+  if (used_by_bcp_p (name))
+return false;
+
   cond = build2 (loc->comp_code, boolean_type_node, loc->expr, loc-
>val);
   assert_stmt = build_assert_expr_for (cond, name);
   if (loc->e)

My original testcase did not trigger anything else.  I'm planning to
check how this affects the testsuite, but before going further I would
like to ask: is this the right direction now?  To me it looks a
little-bit more heavy-handed than the original approach, but maybe it's
worth it.

Best regards,
Ilya



Re: Incremental updating of SSA_NAMEs that are passed to b_c_p

2020-10-29 Thread Ilya Leoshkevich via Gcc
On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote:
> On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc
>  wrote:
> > Hi,
> > 
> > I'd like to revive the old discussion regarding the interaction of
> > jump threading and b_c_p causing the latter to incorrectly return 1
> > in
> > certain cases:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html
> > 
> > The conclusion was that this happening during threading is just a
> > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p
> > should
> > not be registered for incremental updating.
> > 
> > I performed a little experiment and added an assertion to
> > create_new_def_for:
> > 
> > --- a/gcc/tree-into-ssa.c
> > +++ b/gcc/tree-into-ssa.c
> > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple
> > *stmt,
> > def_operand_p def)
> >  {
> >tree new_name;
> > 
> > +  gcc_checking_assert (!used_by_bcp_p (old_name));
> > +
> >timevar_push (TV_TREE_SSA_INCREMENTAL);
> > 
> >if (!update_ssa_initialized_fn)
> > 
> > This has of course fired when performing basic block duplication
> > during
> > threading, which can be fixed by avoiding duplication of basic
> > blocks
> > wi
> > th b_c_p calls:
> > 
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p (const_basic_block
> > bb)
> >   || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> >   || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
> >   || gimple_call_internal_p (g,
> > IFN_GOMP_SIMT_XCHG_BFLY)
> > - || gimple_call_internal_p (g,
> > IFN_GOMP_SIMT_XCHG_IDX)))
> > + || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)
> > + || gimple_call_builtin_p (g, BUILT_IN_CONSTANT_P)))
> > return false;
> >  }
> > 
> > The second occurrence is a bit more interesting:
> > 
> > gimple *
> > vrp_insert::build_assert_expr_for (tree cond, tree v)
> > {
> >   ...
> >   a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
> >   assertion = gimple_build_assign (NULL_TREE, a);
> >   ...
> >   tree new_def = create_new_def_for (v, assertion, NULL);
> > 
> > The fix is also simple though:
> > 
> > --- a/gcc/tree-vrp.c
> > +++ b/gcc/tree-vrp.c
> > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for
> > (tree
> > name, assert_locus *loc)
> >if (loc->expr == loc->val)
> >  return false;
> > 
> > +  if (used_by_bcp_p (name))
> > +return false;
> > +
> >cond = build2 (loc->comp_code, boolean_type_node, loc->expr,
> > loc-
> > > val);
> >assert_stmt = build_assert_expr_for (cond, name);
> >if (loc->e)
> > 
> > My original testcase did not trigger anything else.  I'm planning
> > to
> > check how this affects the testsuite, but before going further I
> > would
> > like to ask: is this the right direction now?  To me it looks a
> > little-bit more heavy-handed than the original approach, but maybe
> > it's
> > worth it.
> 
> Disabling threading looks reasonable but I'd rather not disallow BB
> duplication
> or renaming.  For VRP I guess we want to instead change
> register_edge_assert_for* to not register assertions for the result
> of
> __builtin_constant_p rather than just not allowing VRP to process it
> (there are other consumers still).

If I understood Jeff correctly, we should disable incremental updates
for absolutely all b_c_p arguments, so affecting as many consumers as
possible must actually be a good thing when this approach is
considered?

That said, regtest has revealed one more place where this is happening:
rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi ->
create_new_def_for.  The reduced code is:

int a;

void
b (void)
{
  while (a)
__builtin_constant_p (a) ?: 0;
  if (a == 8)
while (1)
  ;
}

I guess we are not allowed to cancel this transformation, becase we
really need LCSSA for later passes?  This now contradicts the "prohibit
all updates" idea..



If you think that disabling threading is reasonable, could you please
have another look at the original patches?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549211.html

Can anything like this go in after all?

Best regards,
Ilya



Re: Incremental updating of SSA_NAMEs that are passed to b_c_p

2020-11-03 Thread Ilya Leoshkevich via Gcc
On Fri, 2020-10-30 at 09:22 +0100, Richard Biener wrote:
> On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich 
> wrote:
> > On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote:
> > > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc
> > >  wrote:
> > > > Hi,
> > > > 
> > > > I'd like to revive the old discussion regarding the interaction
> > > > of
> > > > jump threading and b_c_p causing the latter to incorrectly
> > > > return 1
> > > > in
> > > > certain cases:
> > > > 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html
> > > > 
> > > > The conclusion was that this happening during threading is just
> > > > a
> > > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p
> > > > should
> > > > not be registered for incremental updating.
> > > > 
> > > > I performed a little experiment and added an assertion to
> > > > create_new_def_for:
> > > > 
> > > > --- a/gcc/tree-into-ssa.c
> > > > +++ b/gcc/tree-into-ssa.c
> > > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple
> > > > *stmt,
> > > > def_operand_p def)
> > > >  {
> > > >tree new_name;
> > > > 
> > > > +  gcc_checking_assert (!used_by_bcp_p (old_name));
> > > > +
> > > >timevar_push (TV_TREE_SSA_INCREMENTAL);
> > > > 
> > > >if (!update_ssa_initialized_fn)
> > > > 
> > > > This has of course fired when performing basic block
> > > > duplication
> > > > during
> > > > threading, which can be fixed by avoiding duplication of basic
> > > > blocks
> > > > wi
> > > > th b_c_p calls:
> > > > 
> > > > --- a/gcc/tree-cfg.c
> > > > +++ b/gcc/tree-cfg.c
> > > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p
> > > > (const_basic_block
> > > > bb)
> > > >   || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> > > >   || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_VOTE_ANY)
> > > >   || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_BFLY)
> > > > - || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_IDX)))
> > > > + || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_IDX)
> > > > + || gimple_call_builtin_p (g,
> > > > BUILT_IN_CONSTANT_P)))
> > > > return false;
> > > >  }
> > > > 
> > > > The second occurrence is a bit more interesting:
> > > > 
> > > > gimple *
> > > > vrp_insert::build_assert_expr_for (tree cond, tree v)
> > > > {
> > > >   ...
> > > >   a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
> > > >   assertion = gimple_build_assign (NULL_TREE, a);
> > > >   ...
> > > >   tree new_def = create_new_def_for (v, assertion, NULL);
> > > > 
> > > > The fix is also simple though:
> > > > 
> > > > --- a/gcc/tree-vrp.c
> > > > +++ b/gcc/tree-vrp.c
> > > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for
> > > > (tree
> > > > name, assert_locus *loc)
> > > >if (loc->expr == loc->val)
> > > >  return false;
> > > > 
> > > > +  if (used_by_bcp_p (name))
> > > > +return false;
> > > > +
> > > >cond = build2 (loc->comp_code, boolean_type_node, loc->expr,
> > > > loc-
> > > > > val);
> > > >assert_stmt = build_assert_expr_for (cond, name);
> > > >if (loc->e)
> > > > 
> > > > My original testcase did not trigger anything else.  I'm
> > > > planning
> > > > to
> > > > check how this affects the testsuite, but before going further
> > > > I
> > > > would
> > > > like to ask: is this the right direction now?  To me it looks a
> > > > little-bit more heavy-handed than the original approach, but
> > > > maybe
> > > > it's
> > > > worth it.
> > > 
> > > Disabling threading looks reasonable but

Reassociation and trapping operations

2020-11-24 Thread Ilya Leoshkevich via Gcc
Hi!

I have a C floating point comparison (a <= b && a >= b), which
test_for_singularity turns into (a <= b && a == b) and vectorizer turns
into ((a <= b) & (a == b)).  So far so good.

eliminate_redundant_comparison, however, turns it into just (a == b).
I don't think this is correct, because (a <= b) traps and (a == b)
doesn't.  The following helps:

--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2144,7 +2144,7 @@ eliminate_redundant_comparison (enum tree_code
opcode,
   if (TREE_CODE (curr->op) != SSA_NAME)
 return false;
   def1 = SSA_NAME_DEF_STMT (curr->op);
-  if (!is_gimple_assign (def1))
+  if (!is_gimple_assign (def1) || gimple_could_trap_p (def1))
 return false;
   lcode = gimple_assign_rhs_code (def1);
   if (TREE_CODE_CLASS (lcode) != tcc_comparison)
@@ -2160,7 +2160,7 @@ eliminate_redundant_comparison (enum tree_code
opcode,
   if (TREE_CODE (oe->op) != SSA_NAME)
continue;
   def2 = SSA_NAME_DEF_STMT (oe->op);
-  if (!is_gimple_assign (def2))
+  if (!is_gimple_assign (def2) || gimple_could_trap_p (def2))
continue;
   rcode = gimple_assign_rhs_code (def2);
   if (TREE_CODE_CLASS (rcode) != tcc_comparison)

However, I couldn't help noticing that other parts of reassociation
pass use stmt_could_throw_p, and trapping is mentioned only in one
place.  Is this intentional?  Shouldn't both throwing and trapping
block reassociation?

Best regards,
Ilya



Re: Reassociation and trapping operations

2020-11-25 Thread Ilya Leoshkevich via Gcc
On Wed, 2020-11-25 at 10:53 +0100, Richard Biener wrote:
> On Wed, Nov 25, 2020 at 8:15 AM Marc Glisse 
> wrote:
> > On Wed, 25 Nov 2020, Ilya Leoshkevich via Gcc wrote:
> > 
> > > I have a C floating point comparison (a <= b && a >= b), which
> > > test_for_singularity turns into (a <= b && a == b) and vectorizer
> > > turns
> > > into ((a <= b) & (a == b)).  So far so good.
> > > 
> > > eliminate_redundant_comparison, however, turns it into just (a ==
> > > b).
> > > I don't think this is correct, because (a <= b) traps and (a ==
> > > b)
> > > doesn't.
> > 
> > Hello,
> > 
> > let me just mention the old
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53805
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53806
> > 
> > There has been some debate about the exact meaning of -ftrapping-
> > math, but
> > don't let that stop you.
> 
> My interpretation has been that GCC considers traps not observable
> unless you compile with -fnon-call-exceptions which means that GCC
> happily elides them.  That's usually in-line of user expectations
> with
> respect to optimization - they do not expect us to do less
> optimization
> just for the sake if there's a trap.  Of course we do have to be
> careful
> to not introduce traps where there were none.
> 
> In particular for say
> 
>  a <= b;
>  foo ();
> 
> you cannot rely on foo () never being called when a <= b traps
> because
> its effect on control flow is not modeled in the IL (we also happily
> DCE any such possibly trapping operation - the traps are not
> considered
> unmodeled side-effects).
> Even with -fnon-call-exceptions when the possible exception is not
> caught within
> the function there are probably similar issues with respect to code
> motion.
> 
> Richard.
> 
> > --
> > Marc Glisse

Thanks for the explanation, that's good to know.  I'll need to rather
ad
just my test expectations then.

Best regards,
Ilya