Re: semantics of uninitialized values in GIMPLE

Richard Biener via Gcc Tue, 11 Jul 2023 02:11:22 -0700

On Tue, Jul 11, 2023 at 10:29 AM Krister Walfridsson
<krister.walfrids...@gmail.com> wrote:
>
> On Tue, 11 Jul 2023, Richard Biener wrote:
>
> >> With "plain copies", do you mean treating it as it is always defined? That
> >> would prevent optimizations such as transforming
> >>
> >>    int t;
> >>    if (1)
> >>      t = *p;
> >>    else
> >>      t = 0;
> >>    return t + 1;
> >>
> >> to the equivalent of
> >>
> >>    return *p + 1;
> >>
> >> because the latter is UB if *p is undefined, but the original is OK if phi
> >> nodes always give us a fully defined value.
> >
> > I meant the copy will simply transfer the state (uninitialized or 
> > initialized)
> > from RHS to LHS but in itself does not invoke undefined behavior.  And
> > "plain" copy would be
> >
> >   int t, u;
> >   t = u;
> >
> > where u is uninitialized and t will be as well but this copy in itself
> > isn't problematic.
>
> So I misunderstood what you meant. This is how I have implemented it! :)
>
>
> > Maybe we should treat these as undefined as well - the problem with PHIs is
> > that the implied copies are necessary because of SSA construction even when
> > they do not appear in the original program.  But since transforms like
> > jump threading
> > can cause those copies to become degenerate it's odd to only treat those as 
> > OK.
> > So you have to think about
> >
> > _1 = PHI <p_2(D)>
> >
> > vs.
> >
> > _1 = p_2(D);
> >
> >>
> >>>>   * selection: Instructions that are selecting an element (COND_EXPR,
> >>>>     VEC_PERM_EXPR, etc.) may select between values having uninitialized
> >>>>     bits, and the resulting value may have uninitialized bits. But the
> >>>>     condition/mask must not have uninitialized bits.
> >>>>   * Extraction: Instructions that are extracting bits (BIT_FIELD_REF 
> >>>> etc.)
> >>>>     may have uninitialized bits in both the input/output.
> >>>>   * Insertion: Instructions that are constructing/inserting values
> >>>>     (COMPLEX_EXPR, etc.) may have uninitialized bits in both the
> >>>>     input/output.
> >>>
> >>> Generally I think "moving" uninitialized bits in full or in part is OK.
> >>> Operating on them, including comparing them (with themselves)
> >>> is eventually asking for troubles.
> >>
> >> Makes sense.
> >>
> >> But I think we must restrict the definition of "move". For example, one
> >> can see x * 2 as moving the uninitialized bits one step. And x + x and
> >> x << 1 are the same as x * 2, so then they too would be defined? I would
> >> prefer if all of them were UB if x contains uninitialized bits...
> >
> > I guess you have to try ...
> >
> >>
> >>>> All other use of values having uninitialized bits are considered UB.
> >>>>
> >>>> Does this behavior make sense?
> >>>>
> >>>> The above seems to work fine so far, with one exception that can be seen
> >>>> in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit
> >>>> field
> >>>>
> >>>>    unsigned char c6:1, c7:1, c8:1, c9:3, c10:1;
> >>>>
> >>>> which is written to as
> >>>>
> >>>>    x.c6 = 0;
> >>>>    x.c7 = 0;
> >>>>    x.c8 = 0;
> >>>>    x.c9 = 7;
> >>>>
> >>>> The store merging pass changes this to
> >>>>
> >>>>    _71 = MEM <unsigned char> [(struct C *)&x + 8B];
> >>>>    _42 = _71 & 128;
> >>>>    _45 = _42 | 56;
> >>>>
> >>>> and the translation validator is now complaining that the pass has
> >>>> introduced UB that was not in the original IR (because the most
> >>>> significant bit in _71 is uninitialized when passed to BIT_AND_EXPR).
> >>>>
> >>>> I could solve this by allowing uninitialized bits in BIT_AND_EXPR and
> >>>> BIT_OR_EXP, and propagating each bit according to
> >>>>
> >>>>    * `0 & uninit` is an initialized `0`
> >>>>    * `1 & uninit` is uninitialized
> >>>>    * `0 | uninit` is uninitialized
> >>>>    * `1 | uninit` is an initialized `1`
> >>>>
> >>>> Is that the correct GIMPLE semantics?
> >>>
> >>> I think the above "moves" the uninitialized MSB from memory to _45 so
> >>> that's OK.
> >>>
> >>> Some "operations" like & 0 or & 1 give either defined values or
> >>> take the uninitialized bit unmodified (thus "move").  I think both
> >>> kinds are OK.  Likewise + 0 or * 0/1 would be OK.  What's not
> >>> OK is operations that use an (fully?) uninitialized value twice,
> >>> like x - x when x is uninitialized.
> >>
> >> + 0 and * 0/1 makes sense to me. One annoyance is that it will make my
> >> tool slower as it must track undefined bits in more cases. But that is
> >> not a valid reason for restricting the IR semantics...
> >
> > * 0 produces a defined value.  The difference with x * 1 is that we can
> > elide the operation so the result should better behave the same with
> > x itself.  So I stand corrected and * 1 should not be "OK", that is
> > the result is still uninitialized.  With '&' a (partly) undefined value
> > might become (fully) defined.
> >
> > A good history of us getting more "correct" here is visible in
> > tree-ssa-ccp.cc:likely_value which tells CCP when a lattice
> > value is UNDEFINED (CCP doesn't track undefinedness on
> > a bit level so it has to treat partly undefined values as defined).
>
> Thanks. I'll take a look at this.
>
>
> >>> I think we want that, as soon as the uninitialized value becomes
> >>> "part" of a then partly initialized value, it's value is "fixed".
> >>> With things like _Complex or vector the situation is a bit
> >>> of a grey area since we have ways to operate on elements.
> >>>
> >>> Note that when we for example use a BIT_FIELD_REF to extract
> >>> the MSB from _42 above the value would be again fully undefined.
> >>
> >> Do you mean that all operations on the values having "fixed" bits are
> >> valid? I do not think that will work. The frozen value may be any value,
> >> which mean that the program is nondeterministic. For example, if x has one
> >> "fixed" uninitialized bit with the other bits 0, then code such as
> >>    if (x)
> >> could "randomly" be either true or false, so the program will not really
> >> have any defined semantic.
> >
> > But if (x) inspects all bits so this operation itself would invoke undefined
> > behavior.
> >
> > Maybe a "useful" definition would be that if an operation could cause later
> > program behavior to differ when you wiggle the uninitialized bits then
> > that operation invokes undefined behavior.  Of course that definition
> > would mean a use in a PHI or in a copy invokes undefined behavior already,
> > so maybe that definition plus exceptions?
> >
> >> And if use of an extracted "fixed" uninitialized bit is UB, then the
> >> following may be UB (if x or y has a "fixed" uninitialized least
> >> significant bit)
> >>    bool b = (x & 1) != (y & 1)
> >> while the equivalent
> >>    bool b = (x ^ y) & 1
> >> is defined.
> >
> > I don't quite get the example but if there are any such transforms then
> > we of course have to make sure to not introduce undefined behavior
> > when it wasn't before.  For example IVOPTs used to add "zero" as
> > + x - x, but when 'x' is uninitialized x - x can in practice become 
> > non-zero.
> >
> > Because with PHIs we treat _1 = PHI<_2, undef> as _1 = _2 which
> > menas that undef == a && undef == b && a != b can hold true.
> > undefs are (not) funny.
> >
> >> And things may be even more fun when arithmetic is "smearing" the fixed
> >> uninitialized values over the variable -- it will then be impossible to
> >> determine if the extracted bits are OK or not when extracted...
> >
> > When taking advantage of undefinedness for optimization we have to
> > error on the safe (defined) side and when restricting what ops we produce
> > we have to error on the other safe (undefined) side.
> >
> > To give another code example, recently we've made more use of
> > tree-ssa.cc:mark_ssa_maybe_undefs which marks registers that
> > possibly take an undefined value in the case the point of the
> > definition would not already invoked undefined behavior (at
> > function boundary all incoming values are defined because (not) producing
> > them at the caller side would have invoked undefined behavior).
>
> I think my implementation is close to what you propose -- most of the
> confusion comes from our implied definitions of "frozen" and
> "uninitialized bit" are slightly different.
>
> I'll update my implementation, and will come back with a more detailed
> proposal in a few weeks when I have tried some more things.


Thanks!  I've also taken the opportunity given by your work at the recent
bugs to propose a talk at this years GNU Cauldron about undefined
behavior on GCC and hope to at least start on documenting the state of
art^WGCC in the internals manual for this.  If you have any pointers to
your work / research I'd be happy to point to it, learn from it (and maybe
steal a word or two ;)).

Thanks,
Richard.

>
>     /Krister

Re: semantics of uninitialized values in GIMPLE

Reply via email to