Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]

Qing Zhao via Gcc-patches Tue, 19 Apr 2022 14:36:23 -0700


> On Apr 14, 2022, at 1:53 AM, Richard Biener <richard.guent...@gmail.com> 
> wrote:
> 
> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao <qing.z...@oracle.com> wrote:
>> 
>> Hi, Richard,
>> 
>> Thanks a lot for taking a look at this issue (and Sorry that I haven’t fixed 
>> this one yet, I was distracted by other tasks then just forgot this one….)
>> 
>>> On Apr 13, 2022, at 3:41 AM, Richard Biener <richard.guent...@gmail.com> 
>>> wrote:
>>> 
>>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek <ja...@redhat.com> wrote:
>>>>> 
>>>>> Hi!
>>>>> 
>>>>> For IBM double double I've added in PR95450 and PR99648 verification that
>>>>> when we at the tree/GIMPLE or RTL level interpret target bytes as a 
>>>>> REAL_CST
>>>>> or CONST_DOUBLE constant, we try to encode it back to target bytes and
>>>>> verify it is the same.
>>>>> This is because our real.c support isn't able to represent all valid 
>>>>> values
>>>>> of IBM double double which has variable precision.
>>>>> In PR104522, it has been noted that we have similar problem with the
>>>>> Intel/Motorola extended XFmode formats, our internal representation isn't
>>>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and 
>>>>> unnormal
>>>>> values.
>>>>> So, the following patch is an attempt to extend that verification to all
>>>>> floats.
>>>>> Unfortunately, it wasn't that straightforward, because the
>>>>> __builtin_clear_padding code exactly for the XFmode long doubles needs to
>>>>> discover what bits are padding and does that by interpreting memory of
>>>>> all 1s.  That is actually a valid supported value, a qNaN with negative
>>>>> sign with all mantissa bits set, but the verification includes also the
>>>>> padding bits (exactly what __builtin_clear_padding wants to figure out)
>>>>> and so fails the comparison check and so we ICE.
>>>>> The patch fixes that case by moving that verification from
>>>>> native_interpret_real to its caller, so that clear_padding_type can
>>>>> call native_interpret_real and avoid that extra check.
>>>>> 
>>>>> With this, the only thing that regresses in the testsuite is
>>>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times 
>>>>> long\\t-16843010 5
>>>>> because it decides to use a pattern that has non-zero bits in the padding
>>>>> bits of the long double, so the simplify-rtx.cc change prevents folding
>>>>> a SUBREG into a constant.  We emit (the testcase is -O0 but we emit worse
>>>>> code at all opt levels) something like:
>>>>>      movabsq $-72340172838076674, %rax
>>>>>      movabsq $-72340172838076674, %rdx
>>>>>      movq    %rax, -48(%rbp)
>>>>>      movq    %rdx, -40(%rbp)
>>>>>      fldt    -48(%rbp)
>>>>>      fstpt   -32(%rbp)
>>>>> instead of
>>>>>      fldt    .LC2(%rip)
>>>>>      fstpt   -32(%rbp)
>>>>> ...
>>>>> .LC2:
>>>>>      .long   -16843010
>>>>>      .long   -16843010
>>>>>      .long   65278
>>>>>      .long   0
>>>>> Note, neither of those sequences actually stores the padding bits, fstpt
>>>>> simply doesn't touch them.
>>>>> For vars with clear_padding_real_needs_padding_p types that are allocated
>>>>> to memory at expansion time, I'd say much better would be to do the stores
>>>>> using integral modes rather than XFmode, so do that:
>>>>>      movabsq $-72340172838076674, %rax
>>>>>     movq    %rax, -32(%rbp)
>>>>>     movq    %rax, -24(%rbp)
>>>>> directly.  That is the only way to ensure the padding bits are initialized
>>>>> (or expand __builtin_clear_padding, but then you initialize separately the
>>>>> value bits and padding bits).
>>>>> 
>>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as mentioned
>>>>> above, the gcc.target/i386/auto-init-4.c case is unresolved.
>>>> 
>>>> Thanks, I will try to fix this testing case in a later patch.
>>> 
>>> I've looked at this FAIL now and really wonder whether "pattern init" as
>>> implemented makes any sense for non-integral types.
>>> We end up with
>>> initializing a register (SSA name) with
>>> 
>>> VIEW_CONVERT_EXPR<long double>(0xfefefefefefefefefefefefefefefefe)
>>> 
>>> as we go building a TImode constant (we verified we have a TImode SET!)
>>> but then
>>> 
>>>         /* Pun the LHS to make sure its type has constant size
>>>            unless it is an SSA name where that's already known.  */
>>>         if (TREE_CODE (lhs) != SSA_NAME)
>>>           lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
>>>         else
>>>           init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>> ...
>>>     expand_assignment (lhs, init, false);
>>> 
>>> and generally registers do not have any padding.  This weird expansion
>>> then causes us to spill the TImode constant and reload the XFmode value,
>>> which is definitely not optimal here.
>>> 
>>> One approach to avoid the worse code generation would be to use mode
>>> specific patterns for registers (like using a NaN or a target specific
>>> value that
>>> can be loaded cheaply),
>> 
>> You mean that using “mode specific patterns” ONLY for registers?
>> Can we use “mode specific patterns” consistently for both registers and 
>> memory?
> 
> The difference is that registers don't have padding while memory
> possibly does, also
> for aggregates using different patterns is too expensive IMHO.  For
> registers the extra
> complication with generic patterns is that efficient initialization
> (without going through memory)
> should be a priority IMHO.
> 
> And for stack memory I still think that initializing the full
> allocated frame (including padding
> between variables!) is the best approach.
> 
>> LLVM use different patterns for different types (Integral, Float, pointer 
>> type, etc…) in order to
>> Catch bugs easily for different types.
>> 
>> The beginning versions of this patch tried to do similar as LLVM, but later 
>> we gave up this and
>> Use the same pattern for all different types.
>> 
>> If now, we want to use “mode specific patterns” for registers, I am 
>> wondering whether it’s
>> Better to just use “mode specific patterns” consistently for both registers 
>> and memory?
>> 
>>> another would be to simply fall back to zero
>>> initialization
>>> when we fail to constant fold the initializer like with
>>> 
>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>>> index 8b1733e20c4..a4b995f71e4 100644
>>> --- a/gcc/internal-fn.cc
>>> +++ b/gcc/internal-fn.cc
>>> @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>>         if (TREE_CODE (lhs) != SSA_NAME)
>>>           lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
>>>         else
>>> -           init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>> +           {
>>> +             init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
>>> +             if (!CONSTANT_CLASS_P (init))
>>> +               init = build_zero_cst (TREE_TYPE (lhs));
>>> +           }
>>>       }
>>>      else
>>>       /* Use zero-init also for variable-length sizes.  */
>>> 
>>> note that still does not address the issue noted by Jakub that we do not
>>> initialize padding this way - but of course that's because we expand a
>>> special assignment from .DEFERRED_INIT and are not initializing
>>> the storage allocated for the variable on the stack (at -O0) by RTL
>>> expansion.  Ideally .DEFERRED_INIT "expansion" for stack variables
>>> would take place there, simply filling the allocated frame with the
>>> pattern or zero.  That would probably mean that RTL expansion
>>> should scoop up .DEFERRED_INITs for variables it decides not to
>>> expand to pseudos and not leave that to stmt expansion.
>> 
>> Need to study this a little bit more...



So, Is what you mean in the above a complete rewrite of the current 
“expand_DEFERRED_INIT”:

Instead of simply using “expand_builtin_memset” for “variables that are not in 
register” and  “expand_assignment”
for “variables that are in register”,  RTL expansion should directly expand 
this call in a lower level: 

i.e,

  tree lhs = gimple_call_lhs (stmt);
  …
  
  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);

  If (MEM_P (target))   // the variable is allocated on stack
  {
     // filling the allocated frame with pattern or zero.   How to do it??
  }
  else                     // the variable is in pseudo register. 
  {
    rtx init_rtx = …;
    emit_move_insn (target, init_rtx)
  }

  
Is the above a correct understanding? 

Thanks.

Qing

>> 
>>> 
>>> I'm going to simply XFAIL this testcase for GCC 12 and opened
>>> PR105259 so we can revisit the above for GCC 13.
>> 
>> I can further study this bug and try to come up with a good solution in 
>> gcc13.
>> 
>> Thank you!
>> 
>> Qing
>>> 
>>> Richard.
>>> 
>>>> Qing
>>>>> 
>>>> 
>>

Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]

Reply via email to