On 09/14/2018 12:17 PM, David Malcolm wrote:
Martin: on the way back from Cauldron I had a go at implementing new
output formats for the warnings from gimple-ssa-sprintf.c, to try to
better indicate to the end user what the problem is.  My plan was to
implement some of the ASCII art ideas we've been discussing.  However,
to try to understand what the pass is doing, I tried visualizing the
existing data structures, and I ended up liking the output of that so much,
that I think that it is what we ought to show the end-user.

Consider the examples from PR middle-end/77696:

char d[4];

typedef __SIZE_TYPE__ size_t;

extern int sprintf (char*, const char*, ...);
extern int snprintf (char*, size_t, const char*, ...);


void f (int i)
{
  // bounded, definite truncation in a directve
  snprintf (d, sizeof d, "%i", 1235);

  // bounded, definite truncation copying format string
  snprintf (d, sizeof d, "%iAB", 123);

  // unbounded, definite overflow in a directve
  sprintf (d, "%i", 1235);

  // unbounded, definite overflow copying format string
  sprintf (d, "%iAB", 123);

  // bounded, possible truncation a directve
  snprintf (d, sizeof d, "%i", i);

  // bounded, possible overflow copying format string
  snprintf (d, sizeof d, "%iAB", i);

  // unbounded, possible overflow in a directve
  sprintf (d, "%i", i);

  // unbounded, possible overflow copying format string
  sprintf (d, "%iAB", 123);

  // unbounded, possible overflow copying format string
  const char *s = i ? "123" : "1234";
  sprintf (d, "%sAB", s);
}

extern char buf_10[80];
extern char tmpdir[80];
extern char fname[8];

void g (int num)
{
  sprintf (buf_10, "/%s/%s-%i.tmp", tmpdir, fname, num);
}

trunk currently emits:

zz.c: In function ‘f’:
zzz.c:12:29: warning: ‘snprintf’ output truncated before the last format 
character [-Wformat-truncation=]
12 |   snprintf (d, sizeof d, "%i", 1235);
   |                             ^
zzz.c:12:3: note: ‘snprintf’ output 5 bytes into a destination of size 4
12 |   snprintf (d, sizeof d, "%i", 1235);
   |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zzz.c:15:30: warning: ‘AB’ directive output truncated writing 2 bytes into a 
region of size 1 [-Wformat-truncation=]
15 |   snprintf (d, sizeof d, "%iAB", 123);
   |                             ~^
zzz.c:15:3: note: ‘snprintf’ output 6 bytes into a destination of size 4
15 |   snprintf (d, sizeof d, "%iAB", 123);
   |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zzz.c:18:18: warning: ‘sprintf’ writing a terminating nul past the end of the 
destination [-Wformat-overflow=]
18 |   sprintf (d, "%i", 1235);
   |                  ^
zzz.c:18:3: note: ‘sprintf’ output 5 bytes into a destination of size 4
18 |   sprintf (d, "%i", 1235);
   |   ^~~~~~~~~~~~~~~~~~~~~~~
zzz.c:21:19: warning: ‘AB’ directive writing 2 bytes into a region of size 1 
[-Wformat-overflow=]
21 |   sprintf (d, "%iAB", 123);
   |                  ~^
zzz.c:21:3: note: ‘sprintf’ output 6 bytes into a destination of size 4
21 |   sprintf (d, "%iAB", 123);
   |   ^~~~~~~~~~~~~~~~~~~~~~~~
zzz.c:33:19: warning: ‘AB’ directive writing 2 bytes into a region of size 1 
[-Wformat-overflow=]
33 |   sprintf (d, "%iAB", 123);
   |                  ~^
zzz.c:33:3: note: ‘sprintf’ output 6 bytes into a destination of size 4
33 |   sprintf (d, "%iAB", 123);
   |   ^~~~~~~~~~~~~~~~~~~~~~~~
zzz.c:37:19: warning: ‘AB’ directive writing 2 bytes into a region of size 
between 0 and 1 [-Wformat-overflow=]
37 |   sprintf (d, "%sAB", s);
   |                  ~^
zzz.c:37:3: note: ‘sprintf’ output between 6 and 7 bytes into a destination of 
size 4
37 |   sprintf (d, "%sAB", s);
   |   ^~~~~~~~~~~~~~~~~~~~~~
zzz.c: In function ‘g’:
zzz.c:46:24: warning: ‘/’ directive writing 1 byte into a region of size 
between 0 and 79 [-Wformat-overflow=]
46 |   sprintf (buf_10, "/%s/%s-%i.tmp", tmpdir, fname, num);
   |                        ^
zzz.c:46:3: note: ‘sprintf’ output between 9 and 105 bytes into a destination 
of size 80
46 |   sprintf (buf_10, "/%s/%s-%i.tmp", tmpdir, fname, num);
   |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


With this patch kit, gcc emits:

zzz.c: In function ‘f’:
zzz.c:12:13: warning: ‘snprintf’ will truncate the output (5 bytes) to size 4 
[-Wformat-truncation=]
12 |   snprintf (d, sizeof d, "%i", 1235);
   |             ^             ^~^
   |             |             | |
   |             |             | 1 byte (for NUL terminator)
   |             |             4 bytes
   |             capacity: 4 bytes
zzz.c:1:6: note: destination declared here
1 | char d[4];
  |      ^
zzz.c:15:13: warning: ‘snprintf’ will truncate the output (6 bytes) to size 4 
[-Wformat-truncation=]
15 |   snprintf (d, sizeof d, "%iAB", 123);
   |             ^             ^~^~^
   |             |             | | |
   |             |             | | 1 byte (for NUL terminator)
   |             |             | 2 bytes
   |             |             3 bytes
   |             capacity: 4 bytes
zzz.c:1:6: note: destination declared here
1 | char d[4];
  |      ^
zzz.c:18:12: warning: buffer overflow: ‘sprintf’ will write 5 bytes into a 
destination of size 4 [-Wformat-overflow=]
18 |   sprintf (d, "%i", 1235);
   |            ^   ^~^
   |            |   | |
   |            |   | 1 byte (for NUL terminator)
   |            |   4 bytes
   |            capacity: 4 bytes
zzz.c:1:6: note: destination declared here
1 | char d[4];
  |      ^
zzz.c:21:12: warning: buffer overflow: ‘sprintf’ will write 6 bytes into a 
destination of size 4 [-Wformat-overflow=]
21 |   sprintf (d, "%iAB", 123);
   |            ^   ^~^~^
   |            |   | | |
   |            |   | | 1 byte (for NUL terminator)
   |            |   | 2 bytes
   |            |   3 bytes
   |            capacity: 4 bytes
zzz.c:1:6: note: destination declared here
1 | char d[4];
  |      ^
zzz.c:33:12: warning: buffer overflow: ‘sprintf’ will write 6 bytes into a 
destination of size 4 [-Wformat-overflow=]
33 |   sprintf (d, "%iAB", 123);
   |            ^   ^~^~^
   |            |   | | |
   |            |   | | 1 byte (for NUL terminator)
   |            |   | 2 bytes
   |            |   3 bytes
   |            capacity: 4 bytes
zzz.c:1:6: note: destination declared here
1 | char d[4];
  |      ^
zzz.c:37:12: warning: buffer overflow: ‘sprintf’ will write between 6 and 7 
bytes into a destination of size 4 [-Wformat-overflow=]
37 |   sprintf (d, "%sAB", s);
   |            ^   ^~^~^
   |            |   | | |
   |            |   | | 1 byte (for NUL terminator)
   |            |   | 2 bytes
   |            |   3...4 bytes
   |            capacity: 4 bytes
zzz.c:1:6: note: destination declared here
1 | char d[4];
  |      ^
zzz.c: In function ‘g’:
zzz.c:46:12: warning: buffer overflow: ‘sprintf’ will write between 9 and 105 
bytes into a destination of size 80 [-Wformat-overflow=]
46 |   sprintf (buf_10, "/%s/%s-%i.tmp", tmpdir, fname, num);
   |            ^~~~~~   ^^~^^~^^~^~~~^
   |            |        || || || |   |
   |            |        || || || |   1 byte (for NUL terminator)
   |            |        || || || 4 bytes
   |            |        || || |1...11 bytes
   |            |        || || 1 byte
   |            |        || |0...7 bytes
   |            |        || 1 byte
   |            |        |0...79 bytes
   |            |        1 byte
   |            capacity: 80 bytes
zzz.c:40:13: note: destination declared here
40 | extern char buf_10[80];
   |             ^~~~~~

The above is missing colorization, which can be seen at:

  https://dmalcolm.fedorapeople.org/gcc/2018-09-14/zzz.html

where the secondary locations within the format string alternate
colors between odd and even secondary locations.

The above output format is rather verbose, but I think that it is
appropriate for this warning.

Rationale: What is the action that an end-user will want to take when
encountering a warning?

I think they will want to:
(a) review it and decide:
  (a.1) "Is this a "true" result?"
  (a.2) "Do I care?"
(b) if they decide it's genuine: "How do I fix this?"

(i.e. we should aim for our diagnostics to be "actionable")

Consider the sprintf/snprintf warnings.  For the user to easily take the
above actions, the most helpful thing we can do is to provide the user
with clear information on how much data the format directives will write
when the code runs, and to compare that with the size of the destination buffer.

I think the typical fix the user will apply will be to increase the size
of the buffer, or to convert sprintf to snprintf: a buffer overflow is
typically very bad, whereas truncating a string can often be much less
serious.  A less common fix will be to add or tweak width specifiers to
specific format directives.

To help the end-user to make these kinds of decisions, these warnings
need to clearly summarize the sizes of *all* of the various writes,
and compare them to the size of the destination buffer.

As I understand it, the current implementation iterates through the
format directives, tracking the possible ranges of data that could have
been written so far.  It emits a warning at the first directive that can
have an overflow, and then stops analyzing the directives at that callsite.

The existing approach thus seems to focus on where the first overflow
can happen.

My proposed new approach analyzes all of the directives up-front, and if
an overflow can occur, then the warning that's emitted shows all of the
various sizes of writes that will happen.

Hence the reimplementation doesn't bother showing the point where the
overflow happens; instead it summarizes all of what will be written, to
help the user in deciding how to fix their code.
(Doing so reduces some of the combinatorial explosion in possible output
messages).

The patch also puts the words "buffer overflow" at the front of the
message where appropriate (to better grab the user's attention).
It also calls out NUL terminators, since people tend to make mistakes
with them.

There's a lot of FIXMEs in the resulting patch, but I wanted to run the
idea past you before fixing all the FIXMEs.  In particular, I haven't
attempted to bootstrap and regression-test the new implementation.
(I suspect many of the existing tests are broken by this...).  Consider
this a prototype.

[the interesting part of the patch kit is in patch 5/5]

Thoughts?

I haven't had a chance to review the patch itself or install
and play with it but going by the example diagnostics you've
provided (thanks, that was helpful) I agree there are some
nice elements there (e.g, referencing the declaration of
the destination object in a note, or showing the size of
the output of each directive).

At the same time, I still have the same concerns as before:
the cost of maintaining the new output and the tests for it
seems considerable.  And while I do like some of
the improvements quite a bit, others seem subjective and
less like a clear win to me.  On balance, I'm not sure
the overall costs and liabilities justify the benefits.

A big concern I have is also about losing some subtle but
important distinctions that the current warnings make and that
the tests rely on to verify the warning works correctly (this
is both the phrasing and the data in the warnings).  If these
are lost or moved into the notes, we'd either lose that aspect
of the tests or will have to replicate it via
the dg-multiline-output directives.  I find those very hard
to work with (and hence my concern about maintenance).

An example is the amount of output printed by the overflowing
directive. Besides testing, I think it's important also for
users in the "may write" kind of warnings -- users need to
see clearly which directive caused the overflow, not just
that the whole call may have overflowed and be expected to
do the math (add up the ranges of output of prior directives)
to determine the problem directive.

Beyond that, when introducing the sprintf warnings I tried
to make them consistent with other similar warnings (e.g.,
-Wstringop-overflow, -Warray-bounds, -Wrestrict, etc), both
in phrasing and in style.  The rewording/reformatting would
make the printf kind entirely different.  E.g., the notation
used to represent ranges ([MIN, MAX] vs MIN...MAX).  I'm not
opposed to tweaking the wording or the style but I think
there is value in consistency and so if we change sprintf
I think we should adopt the same style across the board.

It's definitely useful to be able to tell how much output each
directive produced (that's why it's in the debugging dumps) but
the "ASCII art" feels so different from the style of all other
GCC diagnostics that I'm having trouble convincing myself it's
the right (or even a good) choice.

I understand why you chose different colors to distinguish
adjacent directives -- but so much colorization feels more
distracting than helpful  I suppose it could be mitigated
somewhat by only printing it for the %- kinds of directives
and for plain characters in the format string.

With all that said, in case you haven't seen it, there is
additional detail about the pass in
the -fdump-tree-printf-return-value output, but it isn't
nearly as nicely presented.  Perhaps enhancing the dump format
would be a way to make these improvements accessible to users
without running into the problems above.  It might also be
a good way to experiment with different styles of output until
we have decided on one to converge all these diagnostics on.
Another alternative might be make this output available under
an option, but that would have to be done without excessively
increasing the complexity of the pass.

Martin

PS I'm not sure I see if/where the patch retains
the distinction between certain and possible overflow and
truncation ("writing" vs "may write").

Dave

David Malcolm (5):
  substring-locations: add class format_string_diagnostic_t
  Add range_idx param to range_label::get_text
  gimple-ssa-sprintf.c: move struct call_info out of the dom_walker subclass
  Add pp_humanized_limit and pp_humanized_range
  gimple-ssa-sprintf.c: rewrite overflow/truncation diagnostic (PR 
middle-end/77696)

 gcc/c-family/c-format.c                      |  32 +-
 gcc/c/c-objc-common.c                        |   2 +-
 gcc/c/c-typeck.c                             |   4 +-
 gcc/cp/error.c                               |   2 +-
 gcc/diagnostic-show-locus.c                  |  19 +-
 gcc/gcc-rich-location.h                      |   4 +-
 gcc/gimple-ssa-sprintf.c                     | 670 ++++++++++++---------------
 gcc/pretty-print.c                           | 181 ++++++++
 gcc/pretty-print.h                           |   4 +
 gcc/substring-locations.c                    | 113 +++--
 gcc/substring-locations.h                    |  74 +--
 gcc/testsuite/gcc.dg/sprintf-diagnostics-1.c | 252 ++++++++++
 gcc/testsuite/gcc.dg/sprintf-diagnostics-2.c |  57 +++
 libcpp/include/line-map.h                    |   6 +-
 14 files changed, 927 insertions(+), 493 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/sprintf-diagnostics-1.c
 create mode 100644 gcc/testsuite/gcc.dg/sprintf-diagnostics-2.c


Reply via email to