On Fri, Feb 16, 2018 at 8:07 PM, Martin Sebor <mse...@gmail.com> wrote:
> On 02/16/2018 04:22 AM, Richard Biener wrote:
>>
>> On Thu, Feb 15, 2018 at 6:28 PM, Martin Sebor <mse...@gmail.com> wrote:
>>>
>>> There are APIs to determine the base object and an offset
>>> into it from all sorts of expressions, including ARRAY_REF,
>>> COMPONENT_REF, and MEM_REF, but none of those I know about
>>> makes it also possible to discover the member being referred
>>> to.
>>>
>>> Is there an API that I'm missing or a combination of calls
>>> to some that would let me determine the (approximate) member
>>> and/or element of an aggregate from a MEM_REF expression,
>>> plus the offset from its beginning?
>>>
>>> Say, given
>>>
>>>   struct A
>>>   {
>>>      void *p;
>>>      char b[3][9];
>>>   } a[2];
>>>
>>> and an expression like
>>>
>>>   a[1].b[2] + 3
>>>
>>> represented as the expr
>>>
>>>   MEM_REF (char[9], a, 69)
>>
>>
>>  &MEM_REF (&a, 69)
>>
>> you probably mean.
>
>
> Yes.  I was using the notation from the Wiki
>   https://gcc.gnu.org/wiki/MemRef
>
>>> where offsetof (struct A, a[1].b[2]) == 66
>>>
>>> I'd like to be able to determine that expr refers to the field
>>> b of struct A, and more specifically, b[2], plus 3.  It's not
>>> important what the index into the array a is, or any other
>>> arrays on the way to b.
>>
>>
>> There is code in initializer folding that searches for a field in
>> a CONSTRUCTOR by base and offset.  There's no existing
>> helper that gives you exactly what you want -- I guess you'd
>> ideally want to have a path to the refered object.  But it may
>> be possible to follow what fold_ctor_reference does and build
>> such a helper.
>
>
> Thanks.  I'll see what I can come up with if/when I get to it
> in stage 1.
>
>>
>>> I realize the reference can be ambiguous in some cases (arrays
>>> of structs with multiple array members) and so the result wouldn't
>>> be guaranteed to be 100% reliable.  It would only be used in
>>> diagnostics.  (I think with some effort the type of the MEM_REF
>>> could be used to disambiguate the majority (though not all) of
>>> these references in practice.)
>>
>>
>> Given you have the address of the MEM_REF in your example above
>> the type of the MEM_REF doesn't mean anything.
>
>
> You're right, it doesn't always correspond to the type of
> the member.  It does in some cases but those may be uncommon.
> Too bad.
>
>> I think ambiguity only happens with unions given MEM_REF offsets
>> are constant.
>>
>> Note that even the type of 'a' might not be correct as it may have had
>> a different dynamic type.
>>
>> So not sure what context you are trying to use this in diagnostics.
>
>
> Say I have a struct like this:
>
>   struct A {
>     char a[4], b[5];
>   };
>
> then in
>
>   extern struct A *a;
>
>   memset (&a[0].a[0] + 14, 0, 3);   // invalid
>
>   memset (&a[1].b[0] + 1, 0, 3);    // valid
>
> both references are the same:
>
>    &MEM_REF[char*, (void *)a + 14];
>
> and there's no way to unambiguously tell which member each refers
> to, or even to distinguish the valid one from the other.  MEM_REF
> makes the kind of analysis I'm interested in very difficult (or
> impossible) to do reliably.

Yes.  Similar issues exist for the objsz pass (aka fortify stuff).

> Being able to determine the member is useful in -Wrestrict where
> rather than printing the offsets from the base object I'd like
> to be able to print the offsets relative to the referenced
> member.  Beyond -Wrestrict, identifying the member is key in
> detecting writes that span multiple members (e.g., strcpy).
> Those could (for example) overwrite a member that's a pointer
> to a function and cause code injection.  As it is, GCC has no
> way to do that because __builtin_object_size considers the
> size of the entire enclosing object, not that of the member.
> For the same reason: MEM_REF makes it impossible.

We're first and foremost an optimizing compiler and not a
static analysis tool.  People seem to want some optimization
to make static analysis easier but then they have to live with
imperfect results.  There's no easy way around this kind of
issues.

Richard.

> Martin

Reply via email to