Re: [PATCH] handle MEM_REF with void* arguments (PR c++/95768)

Martin Sebor via Gcc-patches Thu, 09 Jul 2020 08:50:17 -0700

On 6/29/20 1:19 AM, Richard Biener wrote:

On Mon, Jun 29, 2020 at 1:08 AM Martin Sebor <mse...@gmail.com> wrote:


On 6/23/20 1:12 AM, Richard Biener wrote:

On Tue, Jun 23, 2020 at 12:22 AM Martin Sebor via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:


On 6/22/20 12:55 PM, Jason Merrill wrote:

On 6/22/20 1:25 PM, Martin Sebor wrote:

The attached fix parallels the one for the equivalent C bug 95580
where the pretty printers don't correctly handle MEM_REF arguments
with type void* or other pointers to an incomplete type.

The incorrect handling was exposed by the recent change to
-Wuninitialized which includes such expressions in diagnostics.

+        if (tree size = TYPE_SIZE_UNIT (TREE_TYPE (argtype)))
+          if (!integer_onep (size))
+            {
+              pp_cxx_left_paren (pp);
+              dump_type (pp, ptr_type_node, flags);
+              pp_cxx_right_paren (pp);
+            }


Don't we want to print the cast if the pointer target type is incomplete?


I suppose, yes, although after some more testing I think what should
be output is the type of the access.  The target pointer type isn't
meaningful (at least not in this case).

Here's what the warning looks like in C for the test case in
gcc.dg/pr95580.c:

     warning: ‘*((void *)(p)+1)’ may be used uninitialized

and like this in C++:

     warning: ‘*(p +1)’ may be used uninitialized

The +1 is a byte offset, which is correct given that incrementing
a void* in GCC is the same as adding 1 to the byte address, but
dereferencing a void* doesn't correspond to what's going on in
the source.

Even for a complete type (with size greater than 1), printing
the type of the argument plus a byte offset is wrong.  It ends
up with this for the C++ test case from 95768:

     warning: ‘*((int*)<unknown> +4)’ is used uninitialized

when the access is actually ‘*((int*)<unknown> +1)’

So it seems to me for MEM_REF, to make the output meaningful,
it's the type of the access (i.e., the MEM_REF type) that should
be printed here, and the offset should either be in elements of
the accessed type, i.e.,

     warning: ‘*((int*)<unknown> +1)’ is used uninitialized

or, if the access is misaligned, the argument should first be
cast to char*, the offset added, and the result then cast to
the access type, like this:

     warning: ‘*(T*)((char*)<unknown> +1)’ is used uninitialized

The attached revised and less than fully tested patch implements
this for C++ only for now.  If we agree on this approach I'll see
about making the corresponding change in C.


Note that there is no C/C++ way of fully expressing MEM_REF
semantics.  __MEM <int> ((T *)p + 1) is not actually
*(int *)((char *)p + 1) because that does not reflect that the
effective type of the lvalue when TBAA is concerned is 'T'
rather than 'int'.


What form would you say is closest to the C/C++ semantics, or
likely the most useful to users, that GCC could print instead?


Hmm, I'd try *(<pointer derived from `p'>) maybe?  Because there's
no C/C++ that can express what GIMPLE can do here.  Of course
pattern matching the exact cases we can handle like your patch
is an improvement (but as said the TBAA issue is still present).


"pointer derived from" would be misleading because of C++ class
derivation.  But more important, I think the output needs to
reflect what the warning actually is based on.  Leaving out
salient details like types or offsets from warnings about
out-of-bounds accesses makes analyzing them more difficult,
both for users and for us during initial triage.

Note for MEM_REF the offset is always
a constant byte offset but it indeed does not have to be a
multiple of the MEM_REF type size.

I wonder whether printing the MEM_REF in full provides
any real diagnostic value in the more "obfuscated" cases.


I'm not sure what obfuscated cases you're thinking of, or what
you mean by printing it in full.


I think that printing ‘*(T*)((char*)<unknown> +1)’ is likely going
to confuse users because they cannot match this to a source
location.  If we have a source location we should have caret
diagnostics.

  I instrumented the code to
print every MEM_REF in that comes up in warn_uninitialized_vars
and rebuilt GCC.  There are 17,456 distinct instances so I didn't
review them all but those I did look at all look reasonable.
Probably the least useful are those that mention <unknown> by
itself (i.e., <unknown> or *<unknown>).  Those with an offset
are more informative (e.g., *((access**)<unknown> +1).  In
a few the offset is very large, such as *((unsigned int*)sp
+4611686018427387900), but that doesn't seem like a problem.
I'd be happy to share the result.


Here +4611686018427387900 should be printed as -4, MEM_REF
offsets are to be interpreted as signed.


Sure, converting the offset to signed sounds like a good idea.


I'd also not print <unknown> but <register>.


I also don't find <unknown> helpful, but I don't see <register>
as an improvement.  I think printing the SSA variable would be
more informative here since its name is usually related to
the variable it was derived from in the source.  But making that
change (or any other like it) feels like too much feature creep
for this fix.  I'd be happy to do it in a follow up if we agree
it's a good idea.

Either way, please let me know if the patch is okay as is or,
if not, what type it should mention.


+               if (tree size = TYPE_SIZE_UNIT (TREE_TYPE (argtype)))
+                 if (!integer_onep (size))
+                   {

this should be if (!TYPE_SIZE_UNIT (...) || !integer_onep (...)).

Otherwise the original patch looks OK to me.  As said for your
second patch printing *(int*)p only if p is offsetted is inconsistent
and misleading for TBAA reasons.  So I do not view it as
general improvement.  If the type of the MEM_REF offset
and the access type agree doing that would be fine but then
MEM_REF<int>(p) and MEM_REF<int>(p+4) should be treated
the same.


Jason pointed out a problem with the original patch: it doesn't
mention any type so it ends up printing things like

   *(p +1)

or worse:

   *(<unknown> +1)

I agree that a type should be mentioned, but not that it should
be the type of the pointer because in the cases where it doesn't
match the type of the access like the one in the report it's
nonsensical.  It ends up looking like what the C front end prints:

   *((void *)(p)+1)

That said, can we fix the segfault first?


Sure.  I proposed a v2 patch and I'm happy to improve it further,
either before or after committing it.  I don't find the first patch
suitable anymore.

Also see c-pretty-print.c
for another "copy" of this functionality.  It feels we should dispatch
to c-family/ code here.


I agree.  I can clean it up in a follow up, after we agree on what
the first fix should look like.

FWIW, I don't share (or understand) your concern with TBAA or see
a problem with '*(T*)((char*)<unknown> +1)'  In the cases I've seen
it corresponds to the source.  Even if there are cases where it
doesn't faithfully reflect all aliasing subtleties it can't be worse
than printing '*((void *)(p)+1).'  The middle end warnings are mostly
concerned with out-of-bounds accesses where it's the type of the access
and its offset into the destination that matter most.

Martin

Re: [PATCH] handle MEM_REF with void* arguments (PR c++/95768)

Reply via email to