> On Apr 20, 2022, at 5:38 AM, Richard Biener <richard.guent...@gmail.com> > wrote: > > On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao <qing.z...@oracle.com> wrote: >> >> >> >>> On Apr 14, 2022, at 1:53 AM, Richard Biener <richard.guent...@gmail.com> >>> wrote: >>> >>> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao <qing.z...@oracle.com> wrote: >>>> >>>> Hi, Richard, >>>> >>>> Thanks a lot for taking a look at this issue (and Sorry that I haven’t >>>> fixed this one yet, I was distracted by other tasks then just forgot this >>>> one….) >>>> >>>>> On Apr 13, 2022, at 3:41 AM, Richard Biener <richard.guent...@gmail.com> >>>>> wrote: >>>>> >>>>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches >>>>> <gcc-patches@gcc.gnu.org> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek <ja...@redhat.com> wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> For IBM double double I've added in PR95450 and PR99648 verification >>>>>>> that >>>>>>> when we at the tree/GIMPLE or RTL level interpret target bytes as a >>>>>>> REAL_CST >>>>>>> or CONST_DOUBLE constant, we try to encode it back to target bytes and >>>>>>> verify it is the same. >>>>>>> This is because our real.c support isn't able to represent all valid >>>>>>> values >>>>>>> of IBM double double which has variable precision. >>>>>>> In PR104522, it has been noted that we have similar problem with the >>>>>>> Intel/Motorola extended XFmode formats, our internal representation >>>>>>> isn't >>>>>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and >>>>>>> unnormal >>>>>>> values. >>>>>>> So, the following patch is an attempt to extend that verification to all >>>>>>> floats. >>>>>>> Unfortunately, it wasn't that straightforward, because the >>>>>>> __builtin_clear_padding code exactly for the XFmode long doubles needs >>>>>>> to >>>>>>> discover what bits are padding and does that by interpreting memory of >>>>>>> all 1s. That is actually a valid supported value, a qNaN with negative >>>>>>> sign with all mantissa bits set, but the verification includes also the >>>>>>> padding bits (exactly what __builtin_clear_padding wants to figure out) >>>>>>> and so fails the comparison check and so we ICE. >>>>>>> The patch fixes that case by moving that verification from >>>>>>> native_interpret_real to its caller, so that clear_padding_type can >>>>>>> call native_interpret_real and avoid that extra check. >>>>>>> >>>>>>> With this, the only thing that regresses in the testsuite is >>>>>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times >>>>>>> long\\t-16843010 5 >>>>>>> because it decides to use a pattern that has non-zero bits in the >>>>>>> padding >>>>>>> bits of the long double, so the simplify-rtx.cc change prevents folding >>>>>>> a SUBREG into a constant. We emit (the testcase is -O0 but we emit >>>>>>> worse >>>>>>> code at all opt levels) something like: >>>>>>> movabsq $-72340172838076674, %rax >>>>>>> movabsq $-72340172838076674, %rdx >>>>>>> movq %rax, -48(%rbp) >>>>>>> movq %rdx, -40(%rbp) >>>>>>> fldt -48(%rbp) >>>>>>> fstpt -32(%rbp) >>>>>>> instead of >>>>>>> fldt .LC2(%rip) >>>>>>> fstpt -32(%rbp) >>>>>>> ... >>>>>>> .LC2: >>>>>>> .long -16843010 >>>>>>> .long -16843010 >>>>>>> .long 65278 >>>>>>> .long 0 >>>>>>> Note, neither of those sequences actually stores the padding bits, fstpt >>>>>>> simply doesn't touch them. >>>>>>> For vars with clear_padding_real_needs_padding_p types that are >>>>>>> allocated >>>>>>> to memory at expansion time, I'd say much better would be to do the >>>>>>> stores >>>>>>> using integral modes rather than XFmode, so do that: >>>>>>> movabsq $-72340172838076674, %rax >>>>>>> movq %rax, -32(%rbp) >>>>>>> movq %rax, -24(%rbp) >>>>>>> directly. That is the only way to ensure the padding bits are >>>>>>> initialized >>>>>>> (or expand __builtin_clear_padding, but then you initialize separately >>>>>>> the >>>>>>> value bits and padding bits). >>>>>>> >>>>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as >>>>>>> mentioned >>>>>>> above, the gcc.target/i386/auto-init-4.c case is unresolved. >>>>>> >>>>>> Thanks, I will try to fix this testing case in a later patch. >>>>> >>>>> I've looked at this FAIL now and really wonder whether "pattern init" as >>>>> implemented makes any sense for non-integral types. >>>>> We end up with >>>>> initializing a register (SSA name) with >>>>> >>>>> VIEW_CONVERT_EXPR<long double>(0xfefefefefefefefefefefefefefefefe) >>>>> >>>>> as we go building a TImode constant (we verified we have a TImode SET!) >>>>> but then >>>>> >>>>> /* Pun the LHS to make sure its type has constant size >>>>> unless it is an SSA name where that's already known. */ >>>>> if (TREE_CODE (lhs) != SSA_NAME) >>>>> lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs); >>>>> else >>>>> init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init); >>>>> ... >>>>> expand_assignment (lhs, init, false); >>>>> >>>>> and generally registers do not have any padding. This weird expansion >>>>> then causes us to spill the TImode constant and reload the XFmode value, >>>>> which is definitely not optimal here. >>>>> >>>>> One approach to avoid the worse code generation would be to use mode >>>>> specific patterns for registers (like using a NaN or a target specific >>>>> value that >>>>> can be loaded cheaply), >>>> >>>> You mean that using “mode specific patterns” ONLY for registers? >>>> Can we use “mode specific patterns” consistently for both registers and >>>> memory? >>> >>> The difference is that registers don't have padding while memory >>> possibly does, also >>> for aggregates using different patterns is too expensive IMHO. For >>> registers the extra >>> complication with generic patterns is that efficient initialization >>> (without going through memory) >>> should be a priority IMHO. >>> >>> And for stack memory I still think that initializing the full >>> allocated frame (including padding >>> between variables!) is the best approach. >>> >>>> LLVM use different patterns for different types (Integral, Float, pointer >>>> type, etc…) in order to >>>> Catch bugs easily for different types. >>>> >>>> The beginning versions of this patch tried to do similar as LLVM, but >>>> later we gave up this and >>>> Use the same pattern for all different types. >>>> >>>> If now, we want to use “mode specific patterns” for registers, I am >>>> wondering whether it’s >>>> Better to just use “mode specific patterns” consistently for both >>>> registers and memory? >>>> >>>>> another would be to simply fall back to zero >>>>> initialization >>>>> when we fail to constant fold the initializer like with >>>>> >>>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >>>>> index 8b1733e20c4..a4b995f71e4 100644 >>>>> --- a/gcc/internal-fn.cc >>>>> +++ b/gcc/internal-fn.cc >>>>> @@ -3125,7 +3125,11 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt) >>>>> if (TREE_CODE (lhs) != SSA_NAME) >>>>> lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs); >>>>> else >>>>> - init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init); >>>>> + { >>>>> + init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), >>>>> init); >>>>> + if (!CONSTANT_CLASS_P (init)) >>>>> + init = build_zero_cst (TREE_TYPE (lhs)); >>>>> + } >>>>> } >>>>> else >>>>> /* Use zero-init also for variable-length sizes. */ >>>>> >>>>> note that still does not address the issue noted by Jakub that we do not >>>>> initialize padding this way - but of course that's because we expand a >>>>> special assignment from .DEFERRED_INIT and are not initializing >>>>> the storage allocated for the variable on the stack (at -O0) by RTL >>>>> expansion. Ideally .DEFERRED_INIT "expansion" for stack variables >>>>> would take place there, simply filling the allocated frame with the >>>>> pattern or zero. That would probably mean that RTL expansion >>>>> should scoop up .DEFERRED_INITs for variables it decides not to >>>>> expand to pseudos and not leave that to stmt expansion. >>>> >>>> Need to study this a little bit more... >> >> >> So, Is what you mean in the above a complete rewrite of the current >> “expand_DEFERRED_INIT”: >> >> Instead of simply using “expand_builtin_memset” for “variables that are not >> in register” and “expand_assignment” >> for “variables that are in register”, RTL expansion should directly expand >> this call in a lower level: >> >> i.e, >> >> tree lhs = gimple_call_lhs (stmt); >> … >> >> rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); >> >> If (MEM_P (target)) // the variable is allocated on stack >> { >> // filling the allocated frame with pattern or zero. How to do it?? >> } >> else // the variable is in pseudo register. >> { >> rtx init_rtx = …; >> emit_move_insn (target, init_rtx) >> } >> >> >> Is the above a correct understanding? > > No, I would simply expand the automatic in-memory var .DEFERRED_INIT > calls to nothing > but instead process their effect at the time we do > expand_one_stack_var, which means at > CFG expansion time scan for the .DEFERRED_INITs and record whether and how to > initialize the variables. We can then group the variables accordingly > and block-initialize > the stack space also covering padding inbetween variables.
I spent some time reading cfgexpand, especially the “expand_used_vars” part. I have the following questions: 1. Does “Group variables accordingly” mean: for all the auto variables that associate with .DEFERRED_INIT, during “expand_used_var” Phase, we should: first “defer the stack allocation” and marking them as “DEFERRED_INIT” , and later, during the stack grouping phase, in addition to the current “Phase 1”, “phase 2”, “phase 3” “expand_stack_vars”, add another phase in “expand_stack_vars” for the ones marked with “DEFERRED_INIT”, group them together, And then “block-initialize” this part of the stack space? 2. Looks like that “expand_used_vars” is done BEFORE “expand_gimple_basic_block”, So, if we insert the pattern-initialization RTLs for Block-initializing the stack space, how can we do it in the “expand_used_vars” phase before all the RTL statements are emitted? Qing > > Richard. > >> Thanks. >> >> Qing >> >>>> >>>>> >>>>> I'm going to simply XFAIL this testcase for GCC 12 and opened >>>>> PR105259 so we can revisit the above for GCC 13. >>>> >>>> I can further study this bug and try to come up with a good solution in >>>> gcc13. >>>> >>>> Thank you! >>>> >>>> Qing >>>>> >>>>> Richard. >>>>> >>>>>> Qing