Re: determining aggregate member from MEM_REF

Martin Sebor Mon, 26 Feb 2018 07:45:07 -0800

On 02/26/2018 05:08 AM, Richard Biener wrote:

On Fri, Feb 16, 2018 at 8:07 PM, Martin Sebor <mse...@gmail.com> wrote:

On 02/16/2018 04:22 AM, Richard Biener wrote:


On Thu, Feb 15, 2018 at 6:28 PM, Martin Sebor <mse...@gmail.com> wrote:


There are APIs to determine the base object and an offset
into it from all sorts of expressions, including ARRAY_REF,
COMPONENT_REF, and MEM_REF, but none of those I know about
makes it also possible to discover the member being referred
to.

Is there an API that I'm missing or a combination of calls
to some that would let me determine the (approximate) member
and/or element of an aggregate from a MEM_REF expression,
plus the offset from its beginning?

Say, given

  struct A
  {
     void *p;
     char b[3][9];
  } a[2];

and an expression like

  a[1].b[2] + 3

represented as the expr

  MEM_REF (char[9], a, 69)



 &MEM_REF (&a, 69)

you probably mean.



Yes.  I was using the notation from the Wiki
  https://gcc.gnu.org/wiki/MemRef

where offsetof (struct A, a[1].b[2]) == 66

I'd like to be able to determine that expr refers to the field
b of struct A, and more specifically, b[2], plus 3.  It's not
important what the index into the array a is, or any other
arrays on the way to b.



There is code in initializer folding that searches for a field in
a CONSTRUCTOR by base and offset.  There's no existing
helper that gives you exactly what you want -- I guess you'd
ideally want to have a path to the refered object.  But it may
be possible to follow what fold_ctor_reference does and build
such a helper.



Thanks.  I'll see what I can come up with if/when I get to it
in stage 1.

I realize the reference can be ambiguous in some cases (arrays
of structs with multiple array members) and so the result wouldn't
be guaranteed to be 100% reliable.  It would only be used in
diagnostics.  (I think with some effort the type of the MEM_REF
could be used to disambiguate the majority (though not all) of
these references in practice.)



Given you have the address of the MEM_REF in your example above
the type of the MEM_REF doesn't mean anything.



You're right, it doesn't always correspond to the type of
the member.  It does in some cases but those may be uncommon.
Too bad.

I think ambiguity only happens with unions given MEM_REF offsets
are constant.

Note that even the type of 'a' might not be correct as it may have had
a different dynamic type.

So not sure what context you are trying to use this in diagnostics.



Say I have a struct like this:

  struct A {
    char a[4], b[5];
  };

then in

  extern struct A *a;

  memset (&a[0].a[0] + 14, 0, 3);   // invalid

  memset (&a[1].b[0] + 1, 0, 3);    // valid

both references are the same:

   &MEM_REF[char*, (void *)a + 14];

and there's no way to unambiguously tell which member each refers
to, or even to distinguish the valid one from the other.  MEM_REF
makes the kind of analysis I'm interested in very difficult (or
impossible) to do reliably.


Yes.  Similar issues exist for the objsz pass (aka fortify stuff).

Being able to determine the member is useful in -Wrestrict where
rather than printing the offsets from the base object I'd like
to be able to print the offsets relative to the referenced
member.  Beyond -Wrestrict, identifying the member is key in
detecting writes that span multiple members (e.g., strcpy).
Those could (for example) overwrite a member that's a pointer
to a function and cause code injection.  As it is, GCC has no
way to do that because __builtin_object_size considers the
size of the entire enclosing object, not that of the member.
For the same reason: MEM_REF makes it impossible.


We're first and foremost an optimizing compiler and not a
static analysis tool.  People seem to want some optimization
to make static analysis easier but then they have to live with
imperfect results.  There's no easy way around this kind of
issues.


There certainly are limits, but I don't think the two need to
be mutually exclusive.  I believe MEM_REF was introduced mainly
as a solution to avoid the complexity (and bugs) of having to
traverse all the other XXX_REFs all over the place.  There is
no fundamental reason why MEM_REF couldn't be improved or even
replaced to preserve more of the original detail.

Folding things to MEM_REF (or rather, folding them too early)
makes all kinds of analysis harder: not just warnings but even
optimization.  I've raised a whole slew of bugs for the strlen
pass alone where folding string functions to MEM_REF defeats
useful downstream optimizations.  Making strlen (and all other
passes that might benefit from the original detail) work hard
to deal with MEM_REF isn't a good design solution.  It forces
the complexity that MEM_REF is meant to remove back into its
clients.  Worse, because of the loss of detail, the results
are unavoidably suboptimal (at least for certain kinds of
analyses).

Martin

Re: determining aggregate member from MEM_REF

Reply via email to