On Thu, Oct 20, 2022 at 9:22 PM Thomas Schwinge <tho...@codesourcery.com> wrote:
>
> Hi!
>
> On 2022-10-20T14:23:33+0200, Aldy Hernandez <al...@redhat.com> wrote:
> >> I understand 'r & 3' to be logically equivalent to '(r & 2) && (r & 1)',
> >> right?
> >
> > For r == 2, r & 3 == 2, whereas (r & 2) && (r & 1) == 0, so no?
>
> Thanks, and now please let me crawl back under my stone, embarassing...
> That'd rather be '(r & 2) || (r & 1)'.

No worries.  If there was a tally of how many times a GCC hacker has
to crawl under a stone, I'd have the record ;-).

>
> Well, with that now clarified, how about the again updated
> "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached?

I see 7 different tests in this patch.  Did the 6 that pass, fail
before my patch for PR107195 and are now working?   Cause unless
that's the case, they shouldn't be in a test named pr107195-3.c, but
somewhere else.

I see there's one XFAILed test in your patch, and this certainly
doesn't look like something that has anything to do with the patch I
submitted.  Perhaps you could open a PR with an enhancement request
for this one?

That being said...

/* { dg-additional-options -O1 } */
extern int
__attribute__((const))
foo4b (int);

int f4b (unsigned int r)
{
  if (foo4b (r))
    r *= 8U;

  if ((r / 2U) & 2U)
    r += foo4b (r);

  return r;
}
/* { dg-final { scan-tree-dump-times {gimple_call <foo4b,} 1 dom3 {
xfail *-*-* } } } */

At -O2, this is something PRE is doing,  so GCC already handles this.
However, you are suggesting this isn't handled at -O1 and should be??
None of the VRPs run at -O1 so ranger-vrp won't even get a chance.
However, DOM runs at -O1 and it uses ranger to do simple copy
propagation and some jump threading...so technically we could do
something...

DOM should be able to thread from the r *= 8U to the return because
the nonzero mask (known zeros) after the multiplication is 0xfffffff8,
which it could use to solve the second conditional as false.  This
would leave us with:

if (foo4b (r))
  {
    r *= 8U;
   return r;
  }
else
  {
     if ((r / 2U) & 2U)
       r += foo4b (r);
  }

...which exposes the fact that the second call to foo4b() has the same
"r" as the first one, so it could be folded.  I don't know whose job
it is to notice that two const calls have the same arguments, but ISTM
that if we thread the above correctly, someone should be able to clean
this up.  No clue whether this happens at -O1.

However... we're not threading this.  It looks like we're not keeping
track of nonzero bits (known zeros) through the division.  The
multiplication gives us 0xfffffff8 and we should be able to divide
that by 2 and get 0x7ffffffc which solves the second conditional to 0.

So...maybe DOM+ranger could set things up for another pass to clean this up?

Either way, you could open an enhancement request, if anything to keep
the nonzero mask up to date through the division.

Aldy

Reply via email to