On Thu, Feb 11, 2021 at 11:26 PM Martin Sebor <mse...@gmail.com> wrote:
>
> On 2/11/21 1:09 AM, Richard Biener wrote:
> > On Wed, Feb 10, 2021 at 7:03 PM Martin Sebor <mse...@gmail.com> wrote:
> >>
> >> On 2/10/21 3:39 AM, Richard Biener wrote:
> >>> On Tue, Feb 9, 2021 at 4:37 PM Martin Sebor <mse...@gmail.com> wrote:
> >>>>
> >>>> On 2/9/21 12:41 AM, Richard Biener wrote:
> >>>>> On Tue, Feb 9, 2021 at 1:04 AM Martin Sebor via Gcc-patches
> >>>>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>>>
> >>>>>> On 2/8/21 12:09 PM, Jeff Law wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2/3/21 3:45 PM, Martin Sebor wrote:
> >>>>>>>> On 2/3/21 2:57 PM, Jeff Law wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 1/28/21 4:03 PM, Martin Sebor wrote:
> >>>>>>>>>> The GCC 11 -Warray-bounds enhancement to diagnose accesses whose
> >>>>>>>>>> leading offset is in bounds but whose trailing offset is not has
> >>>>>>>>>> been causing some confusion.  When the warning is issued for
> >>>>>>>>>> an access to an in-bounds member via a pointer to a struct that's
> >>>>>>>>>> larger than the pointed-to object, phrasing this strictly invalid
> >>>>>>>>>> access in terms of array subscripts can be misleading, especially
> >>>>>>>>>> when the source code doesn't involve any arrays or indexing.
> >>>>>>>>>>
> >>>>>>>>>> Since the problem boils down to an aliasing violation much more
> >>>>>>>>>> so than an actual out-of-bounds access, the attached patch adjusts
> >>>>>>>>>> the code to issue a -Wstrict-aliasing warning in these cases 
> >>>>>>>>>> instead
> >>>>>>>>>> of -Warray-bounds.  In addition, since the aliasing assumptions in
> >>>>>>>>>> GCC can be disabled by -fno-strict-aliasing, the patch also makes
> >>>>>>>>>> these instances of the warning conditional on -fstrict-aliasing
> >>>>>>>>>> being in effect.
> >>>>>>>>>>
> >>>>>>>>>> Martin
> >>>>>>>>>>
> >>>>>>>>>> gcc-98503.diff
> >>>>>>>>>>
> >>>>>>>>>> PR middle-end/98503 -Warray-bounds when -Wstrict-aliasing would be
> >>>>>>>>>> more appropriate
> >>>>>>>>>>
> >>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>
> >>>>>>>>>>         PR middle-end/98503
> >>>>>>>>>>         * gimple-array-bounds.cc 
> >>>>>>>>>> (array_bounds_checker::check_mem_ref):
> >>>>>>>>>>         Issue -Wstrict-aliasing for a subset of violations.
> >>>>>>>>>>         (array_bounds_checker::check_array_bounds):  Set new 
> >>>>>>>>>> member.
> >>>>>>>>>>         * gimple-array-bounds.h 
> >>>>>>>>>> (array_bounds_checker::cref_of_mref): New
> >>>>>>>>>>         data member.
> >>>>>>>>>>
> >>>>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>>>
> >>>>>>>>>>         PR middle-end/98503
> >>>>>>>>>>         * g++.dg/warn/Warray-bounds-10.C: Adjust text of expected 
> >>>>>>>>>> warnings.
> >>>>>>>>>>         * g++.dg/warn/Warray-bounds-11.C: Same.
> >>>>>>>>>>         * g++.dg/warn/Warray-bounds-12.C: Same.
> >>>>>>>>>>         * g++.dg/warn/Warray-bounds-13.C: Same.
> >>>>>>>>>>         * gcc.dg/Warray-bounds-63.c: Avoid -Wstrict-aliasing.  
> >>>>>>>>>> Adjust text
> >>>>>>>>>>         of expected warnings.
> >>>>>>>>>>         * gcc.dg/Warray-bounds-66.c: Adjust text of expected 
> >>>>>>>>>> warnings.
> >>>>>>>>>>         * gcc.dg/Wstrict-aliasing-2.c: New test.
> >>>>>>>>>>         * gcc.dg/Wstrict-aliasing-3.c: New test.
> >>>>>>>>> What I don't like here is the strict-aliasing warnings inside the 
> >>>>>>>>> file
> >>>>>>>>> and analysis phase for array bounds checking.
> >>>>>>>>>
> >>>>>>>>> ISTM that catching this at cast time would be better.  So perhaps in
> >>>>>>>>> build_c_cast and the C++ equivalent?
> >>>>>>>>>
> >>>>>>>>> It would mean we're warning at the cast site rather than the
> >>>>>>>>> dereference, but that may ultimately be better for the warning 
> >>>>>>>>> anyway.
> >>>>>>>>> I'm not sure.
> >>>>>>>>
> >>>>>>>> I had actually experimented with a this (in the middle end, and only
> >>>>>>>> for accesses) but even that caused so many warnings that I abandoned
> >>>>>>>> it, at least for now.  -Warray-bounds has the benefit of flow 
> >>>>>>>> analysis
> >>>>>>>> (and dead code elimination).  In the front end it would have neither
> >>>>>>>> and be both excessively noisy and ineffective at the same time.  For
> >>>>>>>> example:
> >>>>>>> I think we agree that this really is an aliasing issue and I would go
> >>>>>>> further and claim that it has nothing to do with array bounds 
> >>>>>>> checking.
> >>>>>>> Therefore warning for it within gimple-array-bounds just seems wrong.
> >>>>>>>
> >>>>>>> I realize that you like having DCE run and the ability to look at
> >>>>>>> offsets and such, but it really feels like the wrong place to do this.
> >>>>>>> Aliasing issues are also more of a front-end thing since the language
> >>>>>>> defines what is and what is not valid aliasing -- one might even argue
> >>>>>>> that any aliasing warning needs to be identified by the front-ends
> >>>>>>> exclusively (though possibly pruned away later).
> >>>>>>
> >>>>>> The aliasing restrictions are on accesses, which are [defined in
> >>>>>> C as] execution-time actions on objects.  Analyzing execution-time
> >>>>>> actions unavoidably depends on flow analysis which the front ends
> >>>>>> have only very limited support for (simple expressions only).
> >>>>>> I gave one example showing how the current -Wstrict-aliasing in
> >>>>>> the front end is ineffective against all but the most simplistic
> >>>>>> bugs, but there are many more.  For instance:
> >>>>>>
> >>>>>>       int i;
> >>>>>>       void *p = &i;    // valid
> >>>>>>       float *q = p;    // valid
> >>>>>>       *q = 0;          // aliasing violation
> >>>>>>
> >>>>>> This bug is easily detectable in the middle end but impossible
> >>>>>> to do in the front end (same as all other invalid accesses).
> >>>>>
> >>>>> But the code is valid in GIMPLE which allows to re-use the 'int i' 
> >>>>> storage
> >>>>> with storing using a different type.
> >>>>
> >>>> Presumably you're referring to using placement new?
> >>>
> >>> No, I'm refering to the rules the GIMPLE IL follows.  GIMPLE is no
> >>> longer C or C++ and thus what is invalid in C or C++ doesn't
> >>> necessarily have to be invalid in GIMPLE.
> >>>
> >>>>    The warning
> >>>> would have to be aware of it and either run before placement new
> >>>> is inlined.  Alternatively, the inlining could add some annotation
> >>>> into the IL that the warning would then use to differentiate valid
> >>>> code from invalid.
> >>>
> >>> At least in old versions of the C++ standard a simple "re-use" of
> >>> storage starts lifetime of a new object (for certain classes of types,
> >>> 'int' for example), so no placement new is needed here.
> >>
> >> I'm not familiar with this rule.  Can you point me to the section
> >> of the C++ that describes it?
> >
> > For example C++14 3.8 Object lifetime.  I can read 1) as
> >
> >    int i; // lifetime starts - storage is obtained
> >    i = 1;
> >    foo (i);
> >    float *p = (float *)&i; // lifetime of *p starts - storage is obtained
>
> At this point p is just a pointer that points to the int i.
> The only new object here is the pointer p.
>
> >    *p = 1.; // lifetime of i ends, storage is re-used
>
> I don't think that's the intended reading.  The storage of an object
> can be reused by constructing another object in it, and the only way
> to do that in C++ is by placement new.
>
> Even if the store to *p was valid, it doesn't change i's type (only
> placement new can do that), so there is no way to access that value.
> GCC relies on that by assuming that a store through a pointer to one
> type doesn't change the value stored in declared objects of another
> type.  It's a common bug to disregard this rule and that's I'm
> interested in detecting.  I.e., aliasing violations.
>
> > the lifetime start of an object upon obtaining storage is a bit vague
> > which is why GIMPLE starts the lifetime only upon the first _store_.
> >
> > Now I didn't find any restriction on how "storage with the proper
> > alignment and size for type T is obtained".  But of course restrictions
> > to the above may be scattered throughout the 1000+ pages of
> > the standard.
>
> Storage for an object is obtained either by its declaration or by
> an allocation call.
>
> >
> > For C similar behavior is required if you ever implement a
> > memory allocator that re-uses storage.  (yeah, I know that
> > there's the argument that C doesn't allow to implement a
> > memory allocator ... but that's not practical, not for GIMPLE
> > at least)
>
> Allocated storage is subject to slightly different rules that those
> that apply to declared objects and I'm focusing only on the latter.
>
> (The argument that C doesn't make it possible to implement a memory
> allocator is about the strictly conforming subset of C, i.e.,
> the subset that excludes unspecified and implementation-defined
> behavior.)
>
> But I'm still interested in:
>
>    What sort of a markup would you suggest to use and on what trees?
>    Would a bit on INDIRECT_REF do?
>
> and
>
>    ...unless there is some more straightforward way to change the type
>    of a declared object than placement new (I don't know of one), would
>    INDIRECT_REF alone, with no markup, be a sufficient indication that
>    the access doesn't modify the type of the accessed object?

I do not have definite answers here.  But it seems you're after the
fact that some languages do not allow changing the dynamic type
of declared objects (C++ does, kind-of, for "storage" portions!).  So
to me it seems that not the accesses but instead the objects need
to be marked (thus the DECL node in this case).

But again, beware of the funny ideas of the C++ commitee, making

struct { int n; char c[4]; } x, y;
new (x.c) int)();
*(int *)x.c = 1;
y = x;
assert (*(int *)y.c == 1);

valid.  So what you design today will break tomorrow :P

Richard.

> Thanks
> Martin
>
> >
> > Richard.
> >
> >> AFAIK, C and C++ share the same aliasing restrictions with
> >> the exception of placement operator new.  The intent, made explicit
> >> in Footnote 37 in C++ 98, is for the memory models of the two
> >> languages to be the same.  (The same footnote is in all published
> >> revisions of C++).
> >>
> >>>
> >>>> Likewise if there are other such constructs (are there?) they would
> >>>> need be marked up somehow by the front end.
> >>>
> >>> If the frontend requires that a store does not change the memorys
> >>> dynamic type (for diagnostic purposes) then it would need to mark
> >>> it in a special way.  By default any store in the GIMPLE IL alters
> >>> the dynamic type of the destination.
> >>
> >> What sort of a markup would you suggest to use and on what trees?
> >> Would a bit on INDIRECT_REF do?
> >>
> >> But unless there is some more straightforward way to change the type
> >> of a declared object than placement new, why would INDIRECT_REF alone,
> >> with no markup, not be a sufficient indication that the access doesn't
> >> modify the type of the accessed object?
> >>
> >> Martin
> >>
> >>>
> >>>> I speculate that's what Jeff was suggesting by having the FE mark
> >>>> up the code.
> >>>>
> >>>> Martin
> >>>>
> >>>>>
> >>>>>> Whether this is done in gimple-array-bounds or some other pass seems
> >>>>>> to me like a minor implementation detail.  It naturally came out of
> >>>>>> an enhancement I implemented there (which would detect the above
> >>>>>> with float replaced by any larger type as an out-of-bounds access)
> >>>>>> but I have no problem with moving this subset to some other pass
> >>>>>> (or duplicating it there).  In fact, as I said, I'd like to enhance
> >>>>>> -Wstrict-aliasing to detect more bugs at some point, so that might
> >>>>>> be a good time to move this instance of the warning there as well.
> >>>>>> But the enhancement I'm thinking of is in the middle end, not in
> >>>>>> the front end.
> >>>>>>
> >>>>>> In any event, the warning is valid, just the phrasing is misleading
> >>>>>> since there in the case of the struct member there isn't necessarily
> >>>>>> any subscripting involved or even an access to members beyond the end
> >>>>>> of the accessed object.  Issuing it under -Warray-bounds and with
> >>>>>> -fno-strict-aliasing compounds the problem.  I put together this
> >>>>>> patch in response to the feedback I got from you and from the reporter
> >>>>>> in PR 98503 where you both made this point, so I'm not sure why
> >>>>>> improving it as both of you suggested is an issue.
> >>>>>>
> >>>>>> Martin
> >>>>
> >>
>

Reply via email to