https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087

Oliver Kozul <oliver.ko...@rt-rk.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |oliver.ko...@rt-rk.com

--- Comment #3 from Oliver Kozul <oliver.ko...@rt-rk.com> ---
(In reply to Jeffrey A. Law from comment #2)
> True that there's a tradeoff here.  But in the spot where we're most likely
> to be able to do something with these cases the loop optimizer has already
> been run and any profitable LICM has already been realized.
> 
> Unfortunately I don't see good ways to handle #2 and #3.  The basic problem
> is combine is limited in how many insns it processes at a time and it's not
> enough to expose both constants in the equality comparison step.
> 
> Oliver has a change for the first case though.  I'm still evaluating it.

My patch for the first example has been submitted and it passed all tests on
Patchwork CI, so I am hopeful that it is a valid improvement.

I have managed to catch the pattern mentioned by the author by searching for a
PLUS expression instead of an EQ expression, as I have found that all constants
are exposed there. I use this pattern when handling all three examples. This is
possible since equality expressions subtract the right hand side of the
expression from the left side and equate it with zero.

In the meantime I also sent a patch for example two that fails in specific
cases of torture testing. The patch can be seen in the following link:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671015.html

The problematic edge cases for my patch are, as far as I can see, unions. When
matching my proposed pattern, I do not have information about which bits the
union fields occupy, therefore I do not shift them back to their original
positions.

An example where errors occur can be seen below:
```c
extern void abort() __attribute__ ((noreturn));

struct s
{
  unsigned long long f1 : 40;
#if(__SIZEOF_INT__ >= 4)
  unsigned int f2 : 24;
#else
  unsigned long int f2 : 24;
#endif
} sv;

int main()
{
  int f2;
  sv.f2 = (1 << 24) - 1;
  __asm__ volatile ("" : : : "memory");
  ++sv.f2;
  f2 = sv.f2;
  if (f2 != 0)
    abort();
  return 0;
}
```

My pattern gets recognized when incrementing sv.f2 and generates the following
assembly:
...
        lui     a4,%hi(sv)
        ld      a5,%lo(sv)(a4)
        li      a3,-1
        slli    a2,a3,40
        or      a5,a5,a2
        sd      a5,%lo(sv)(a4)
        ld      a5,%lo(sv)(a4)
        srli    a3,a3,24
        srli    a2,a5,40
        andi    a2,a2,-1
        addi    a2,a2,1
        and     a5,a5,a3
        or      a5,a2,a5
        sd      a5,%lo(sv)(a4)
        beq     a2,zero,.L2
...

Compared to the original assembly:
...
        lui     a3,%hi(sv)
        ld      a5,%lo(sv)(a3)
        li      a2,-1
        slli    a4,a2,40
        or      a5,a5,a4
        sd      a5,%lo(sv)(a3)
        ld      a4,%lo(sv)(a3)
        srli    a2,a2,24
        srli    a5,a4,40
        addiw   a5,a5,1
        slli    a5,a5,40
        and     a4,a4,a2
        or      a4,a5,a4
        sd      a4,%lo(sv)(a3)
        beq     a5,zero,.L2
...

As we can see, my code is lacking a slli instruction that would shift sv.f2 by
40 bits, therefore generating faulty results.

Reply via email to