On 6/16/20 10:13 AM, Jeff Law wrote:
On Mon, 2020-06-15 at 11:10 -0600, Martin Sebor wrote:
That's fine. Since they are treated as equivalent it shouldn't
matter which of the equivalent alternatives is chosen (there
may be many). It's the particular choice of the smaller member
that makes it a problem: both in the terms of triggering a false
positive and in terms of the note referencing a member the source
code doesn't use.
If PRE instead picked the bigger member it wouldn't necessarily
trigger the warning. But if that member was also too small,
the note might still reference the wrong member.
But if PRE picked another equivalent representation not involving
any member at all but rather an offset from the base object (i.e.,
just a MEM_REF) there would be no problem either way: no false
positive, and if it overflowed, the warning wouldn't reference
any member but just the base object.
I'm not sure if this is actually viable from an implementation standpoint. I'd
have to dig deeply into PRE's implementation to know. But I suspect anything
you
suggest in this space that continues relying on the type information in MEM_REFs
is going to be rejected as building on top of a fundamentally flawed foundation.
I'm not sure I understand why you continue to mention type
information or MEM_REF in this context when the problem we're
dealing with is the member in the outer COMPONENT_REF.
I'm also not sure what you mean by fundamentally flawed. If
we want to continue to consider the member substitution a valid
transformation, then presumably you mean the assumption that
the member in a COMPONENT_REF in GIMPLE refers to the same member
in the source code. All the late warnings that check accesses
(including -Warray-bounds, -Wstringop-overflow, -Wformat-overflow,
and -Wrestrict) make this basic assumption.
I can also imagine other transformations in the same vein that
could also be considered valid and that would cause false
positives in these contexts. For instance, given
struct S { char a[X], b[Y], c[Z]; };
transforming
p->b[K] and p->c[L]
into
p->a[X + K] and p->a[X + Y + L].
I'm not sure it was ARRAY_REF or also MEM_REF but I am pretty
sure I've seen something like this happen (was it store merging?)
Either way, if all of these transformations are valid then all
these late warnings are built on the same flawed foundation.
Instead of substituting one member for another in the COMPONENT_REF
when both happen to be accessed at the same offset, using a MEM_REF
alone into the enclosing struct or union plus the offset of
the members would avoid the problem. Something like this:
Ultimately that's just a bandaid over a flawed implementation. Fundamentally
the
problem is the diagnostics should not be depending on the type of those MEM
expressions. As long as we continue to do that we're going to run into
problems.
Hence my suggestion we look at attaching suitable type information to the calls
early in the pipeline, possibly at AST generation time.
It's not the MEM_REF type that's a problem here. The general
issue with the reliability of late warnings like this one also
isn't isolated to just function calls. Any checked statement
is susceptible to them.
In this case, the problem is that the COMPONENT_REF member is
not the one referenced in the source code. So more reliable
type information for MEM_REFs wouldn't solve it: we would also
need more reliable COMPONENT_REFs.
I'm not suggesting we have more accurate type information on either of those
*_REF nodes. Instead I'm suggesting type information get attached to the
statement which uses them. ie, the call. If that turns out to work well, we
could look to expand it to other cases.
It's not a new IL. Not even close. It's just attaching the original type of
arguments to the call before the optimizers muck it up.
Not type in this case. The COMPONENT_REF is what would need to
be attached to the statement, referencing the original member.
Or, for things like ARRAY_REF (MEM_REF (...)), the ARRAY_REF.
So every statement would need to maintain two sets of operands:
the original set before the PRE/FRE substitution (and all others
like it), and another after it. For function calls, that would
mean two sets of arguments. I imagine both sets of operands
would need to be updated by optimizations to stay in sync (or
be folded). Yes?
If so, it sounds very involved to me, almost on par with
maintaining a separate IL. Maybe it's not quite as bad as
that, but it does feel like a whole lot of work to implement
and maintain. If it isn't, let me know what I'm missing.
Martin