13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og

jakub at gcc dot gnu.org via Gcc-bugs Thu, 16 Feb 2023 08:34:40 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'd say this is a bug in expand_doubleword_shift_condmove or so.
aarch64 when !TARGET_SIMD is a !SHIFT_COUNT_TRUNCATED target and shift_mask is
0.
And HAVE_conditional_move is non-zero.
Now, if the shift count is constant at expansion time, we just select one of
the
expand_superword_shift or expand_subword_shift depending on the exact value and
the shift count ought to be in both cases in the [0, BITS_PER_WORD - 1] range.
Similarly, if !HAVE_conditional_move and shift count is non-constant, we do the
same except that we select one at runtime, so again at runtime the chosen shift
count should be [0, BITS_PER_WORD - 1].
But for expand_doubleword_shift_condmove with shift_mask 0, op1 is [0, 2 *
BITS_PER_WORD - 1] and we pass op1, superword_op1 where the latter is op1 -
BITS_PER_WORD.
So, in expand_doubleword_shift_condmove subword_op1 is in [0, 2 * BITS_PER_WORD
- 1] range and superword_op1 is in [-BITS_PER_WORD, BITS_PER_WORD - 1] range. 
And the
routine just emits expand_superword_shift and expand_subword_shift and selects
using conditional move one of those.  But that means one of the two shifts is
necessarily with out of range count, either subword_op1 is [BITS_PER_WORD, 2 *
BITS_PER_WORD - 1] i.e. too large, or superword_op1 is in [-BITS_PER_WORD, -1]
range (i.e. negative).
Don't we need to mask those counts in that case (both)?
Now, in the testcase __builtin_add_overflow_p is actually evaluated to constant
0 only during the expansion (or later?) - in this particular case I wonder why
we haven't optimized it earlier because for any unsigned addends the addition
is in [0, 2 * UINT_MAX - 2] range and so fits well into signed __int128 range,
something to be looked at for GCC 14.
But I fear it is exactly the only during RTL discovered constant that we later
on propagate into the op1 - BITS_PER_WORD and thus do one of the shifts with
count -64.

CCing Richard as the author of that code from 2004.

[Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og

Reply via email to