Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

Martin Sebor Thu, 11 Aug 2016 16:37:27 -0700

On 07/22/2016 04:12 PM, Jeff Law wrote:

Working through the new pass...  Overall it looks pretty good.  There's
a certain level of trust I'll extend WRT getting the low level details
right -- a thorough testsuite obviously helps there.


In the latest patch where I add the return value optimization I also
add a new test that compares the return value computed by the pass
to the value returned by the library call.  That greatly increases
my confidence in the correctness of the pass (though the test still
could stand to be enhanced in a number of ways).

+const pass_data pass_data_sprintf_length = {
+  GIMPLE_PASS,        // pass type
+  "sprintf_length",   // pass name
+  OPTGROUP_NONE,      // optinfo_flags
+  TV_NONE,            // tv_id
+  PROP_cfg,           // properties_required
+  0,                 // properties_provided
+  0,                 // properties_destroyed
+  0,                 // properties_start
+  0,                 // properties_finish

So the worry here is that you need PROP_ssa in the late pass.  I'm not
sure if we have the infrastructure to allow you to distinguish the two.
I guess we could literally make it two passes with two distinct
pass_data structures.


I'm not sure I understand the concern.  It sounds like you are
suggesting to add another pass_data struct with PROP_ssa among
the required properties and specify it as an argument to
the gimple_opt pass constructor.  What will it do that what
I have doesn't (make sure the ssa pass runs before this pass?)
and how can I test it?


In handle_gimple_call, you don't handle anti ranges -- I haven't looked
closely at canonicalization rules in VRP, so I don't know if you're
likely to see a range like ![MININT, -1] which would be a synonym for
[0..MAXINT].  It might be worth doing some instrumentation to see if
you're getting useful anti-ranges with any kind of consistency.  ISTM
the most interesting ones are going to allow you to drop or insist on a
"-" sign.


I don't recall seeing any inverted ranges when I was developing
the range tests.  I had high hopes for the range information but
it turned out to be so poor that I decided not to spend time
completing this part.  Once Andrew's new VRP pass is available
I'll come back to it.  If it's as generic as it sounds this pass
will not only be a consumer but also be able to contribute to
improving it.

+      /* As a special case, bounded function allow the explicitly
+        specified destination size argument to be zero as a request
+        to determine the number of bytes on output without actually
+        writing any output.  Disable length checking for those.  */

This doesn't parse well.

+  /* Try to use __builtin_object_size although it rarely returns
+     a useful result even for straighforward cases.  */

Hopefully you've stashed away some tests for this so that we can work on
them independently of the sprintf checking itself?  ISTM that the
recursive handling of POINTER_PLUS_EXPR really ought to move into the
generic builtin_object_size handling itself as an independent change.


Bug 71831 tracks this limitation/enhancement.  I would expect to
be relatively easy to enhance the compute_builtin_object_size
function to return more useful results for some basic expressions
even without optimization, without running the whole objsize pass.
It seems that a good number of other features in GCC (not just
warnings) would benefit from it.

+  /* Bail early if format length checking is disabled, either
+     via a command line option, or as a result of the size of
+     the destination object not being available, or due to
+     format directives or arguments encountered during processing
+     that prevent length tracking with any reliability.  */
+
+  if (HOST_WIDE_INT_MAX <= info.objsize)
+      return;

I think the return is indented too deep.

+      if (*pf == 0)
+       {
+         /* Incomplete directive.  */
+         return;
+       }

For this and the other early returns from compute_format_length, do we
know if -Wformat will catch these errors?  Might that be a low priority
follow-up project if it doesn't?


Being invoked by the front end, -Wformat is limited to checking
constant strings, so it misses problems that a pass that runs
later could detect.  With a bit of effort (to avoid duplicate
warnings) it would be possible to enhance this pass to catch some
these as well.  I chose not to spend time on it in this version
since non-constant format strings are rare but it I agree that
it's something to look into after this patch is in.

In format_floating, we end up using actual host floating point
arithmetic.  That's generally been frowned upon.  We're not doing
anything in terms of code generation here, so ultimately that might be
what allows us to still be safe from a cross compilation standpoint.

It's also not clear if that routine handles the other target floating
point formats.  For things like VAX FP, I'm comfortable issuing a
warning that this stuff isn't supported on that target.   We have other
targets where a double might be the same size as a float.   Though I
guess in that case the worst we get is false positives, so that may not
be a huge issue either.  I'm not sure how to check for this stuff
without bringing in aspects of the target though, which we'd like to avoid.

In format_integer you have target specific ifdefs (solaris and aix).
First we want to avoid conditionally compiled code.  Second, from a
modularity something like TARGET_AIX and TARGET_SOLARIS really aren't
appropriate in here. I suspect what you're going to want to do is add
something to the targetm structure that you can query and/or call here
is the best we can do.

Again, overall it looks good.  My biggest concern is format_integer and
format_float and the bleeding of target dependencies into this code.  To
some degree it may be unavoidable and we can mitigate the damage with
stuff in targetm -- I'd like to hear your thoughts.


I've reworked the patch to avoid making assumptions about the floating
point format and to remove the #ifdefs and use target hooks instead.
I decided to use MPFR for the floating point stuff.  I tried a couple
of other approaches but in the end I think the MPFR route will yield
the most reliable results.

I'm still running tests but once I'm done I'll post the updated patch
for review.

Thanks
Martin

Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

Reply via email to