On 11/22/2017 05:03 PM, Jeff Law wrote:
On 11/21/2017 12:07 PM, Martin Sebor wrote:
On 11/21/2017 09:55 AM, Jeff Law wrote:
On 11/19/2017 04:28 PM, Martin Sebor wrote:
On 11/18/2017 12:53 AM, Jeff Law wrote:
On 11/17/2017 12:36 PM, Martin Sebor wrote:
The attached patch enhances -Wstringop-overflow to detect more
instances of buffer overflow at compile time by handling non-
constant offsets into the destination object that are known to
be in some range. The solution could be improved by handling
even more cases (e.g., anti-ranges or offsets relative to
pointers beyond the beginning of an object) but it's a start.
In addition to bootsrapping/regtesting GCC, also tested with
Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
regressions.
Martin
The top of GDB fails to compile at the moment so the validation
there was incomplete.
gcc-77608.diff
PR middle-end/77608 - missing protection on trivially detectable
runtime buffer overflow
gcc/ChangeLog:
PR middle-end/77608
* builtins.c (compute_objsize): Handle non-constant offsets.
gcc/testsuite/ChangeLog:
PR middle-end/77608
* gcc.dg/Wstringop-overflow.c: New test.
The recursive call into compute_objsize passing in the ostype avoids
having to think about the whole object vs nearest containing object
issues. Right?
What's left to worry about is maximum or minimum remaining bytes in the
object. At least that's my understanding of how ostype works here.
So we get the amount remaining, ignoring the variable offset, from the
recursive call (SIZE). The space left after we account for the
variable
offset is [SIZE - MAX, SIZE - MIN]. So ISTM for type 0/1 you have to
return SIZE-MIN (which you do) and for type 2/3 you have to return
SIZE-MAX which I think you get wrong (and you have to account for the
possibility that MAX or MIN is greater than SIZE and thus there's
nothing left).
Subtracting the upper bound of the offset from the size instead
of the lower bound when the caller is asking for the minimum
object size would make the result essentially meaningless in
common cases where the offset is smaller than size_t, as in:
char a[7];
void f (const char *s, unsigned i)
{
__builtin_strcpy (a + i, s);
}
Here, i's range is [0, UINT_MAX].
IMO, it's only useful to use the lower bound here, otherwise
the result would only rarely be non-zero.
But when we're asking for the minimum left, aren't we essentially asking
for "how much space am I absolutely sure I can write"? And if that is
the question, then the only conservatively correct answer is to subtract
the high bound.
I suppose you could look at it that way but IME with this work
(now, and also last year when I submitted a patch actually
changing the built-in), using the upper bound is just not that
useful because it's too often way too big. There's no way to
distinguish an out-of-range upper bound that's the result of
an inadequate attempt to constrain a value from an out-of-range
upper bound that is sufficiently constrained but in a way GCC
doesn't see.
Understood.
So while it's reasonable to not warn in those cases where we just have
crap range information (that's always going to be the case for some code
regardless of how good my work or Andrew/Aldy's work is), we have to be
very careful and make sure that nobody acts on this information for
optimization purposes because what we're returning is not conservatively
correct.
There are no clients of this API that would be affected by
the decision one way or the other (unless the user specifies
a -Wstringop-overflow= argument greater than the default 2)
so I don't think what we do now matters much, if at all.
Right, but what's to stop someone without knowledge of the
implementation and its quirk of not returning the conservatively safe
result from using the results in other ways.
Presumably they would find out by testing their code. But this
is a hypothetical scenario. I added the function for warnings.
I wasn't expecting it to be used for optimization, no such uses
have emerged, and I don't have the impression that anyone is
contemplating adding them (certainly not in stage 3). If you
think the function could be useful for optimization then we
should certainly consider changing it as we gain experience
with it under those conditions.
What would the impact be of simply not supporting those queries,
essentially returning "I don't know" rather than returning something
that isn't conservatively correct?
Except for false negatives with -Wstringop-overflow=3 and =4
(i.e., Object Size type 2 and 3) I don't think there would be
any impact. As I said, the function isn't used for optimization
and I don't think the option is used in these modes either, so
in my mind it matters very little. I don't even think there
are any tests for it in these modes, either.
If we have to support both queries and we're going to return something
that is not conservatively correct, then it really needs to be
documented to avoid folks using the code in ways that are not safe.
Perhaps in the future with some of the range improvements
that you, Aldy and Andrew have been working on.
I think they'll help, but there's always going to be codes that can't be
analyzed, it's just inherent in the languages we use.
That said, if it helps us move forward with this enhancement
I'll use the upper bound -- let me know. In the future, when
it is actually used, we'll adjust it as necessary.
Understood. Let's answer the questions above first. Namely, do we have
to support the other mode? I thought I saw something that indicated it
wasn't ever used that way by the current clients. So if we can just
return "don't know" for those ostype queries that's a reasonable place
to go. Else we should look at a clear comment about the limitations of
the code to help prevent (ab)use of the analysis in contexts were it's
not appropriate.
The -Wstringop-overflow= option supports all four object size
types. I suspect only types 0 and 1 are used but I also don't
see a pressing need to disable or compromise the others. If
you do I'm fine with changing the option either to only support
modes 0 and 1, or to remove the argument altogether (and using
the same default as today, i.e., mode 0 for memxxx functions
and mode 1 for strxxx functions). But I see no benefit in
changing the function to give up in modes 2 and 3 and have it
cause false negatives. It simply isn't necessary today.
For now, I've added comments to clarify the return value and
the purpose of the function. Let me know if that's good enough
or if you want me to make any other changes.
Martin
PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow
gcc/ChangeLog:
PR middle-end/77608
* builtins.c (compute_objsize): Handle non-constant offsets.
gcc/testsuite/ChangeLog:
PR middle-end/77608
* gcc.dg/Wstringop-overflow.c: New test.
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 650de0d..e35c514 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3262,8 +3262,12 @@ check_sizes (int opt, tree exp, tree size, tree maxlen, tree src, tree objsize)
/* Helper to compute the size of the object referenced by the DEST
expression which must have pointer type, using Object Size type
OSTYPE (only the least significant 2 bits are used). Return
- the size of the object if successful or NULL when the size cannot
- be determined. */
+ an estimate of the size of the object if successful or NULL when
+ the size cannot be determined. When the referenced object involves
+ a non-constant offset in some range the returned value represents
+ the largest size given the smallest non-negative offset in the
+ range. The function is intended for diagnostics and is not
+ suitable for optimization. */
tree
compute_objsize (tree dest, int ostype)
@@ -3276,24 +3280,57 @@ compute_objsize (tree dest, int ostype)
if (compute_builtin_object_size (dest, ostype, &size))
return build_int_cst (sizetype, size);
- /* Unless computing the largest size (for memcpy and other raw memory
- functions), try to determine the size of the object from its type. */
- if (!ostype)
- return NULL_TREE;
-
if (TREE_CODE (dest) == SSA_NAME)
{
gimple *stmt = SSA_NAME_DEF_STMT (dest);
if (!is_gimple_assign (stmt))
return NULL_TREE;
+ dest = gimple_assign_rhs1 (stmt);
+
tree_code code = gimple_assign_rhs_code (stmt);
- if (code != ADDR_EXPR && code != POINTER_PLUS_EXPR)
- return NULL_TREE;
+ if (code == POINTER_PLUS_EXPR)
+ {
+ /* compute_builtin_object_size fails for addresses with
+ non-constant offsets. Try to determine the range of
+ such an offset here and use it to adjus the constant
+ size. */
+ tree off = gimple_assign_rhs2 (stmt);
+ if (TREE_CODE (off) == SSA_NAME
+ && INTEGRAL_TYPE_P (TREE_TYPE (off)))
+ {
+ wide_int min, max;
+ enum value_range_type rng = get_range_info (off, &min, &max);
- dest = gimple_assign_rhs1 (stmt);
+ if (rng == VR_RANGE)
+ {
+ if (tree size = compute_objsize (dest, ostype))
+ {
+ wide_int wisiz = wi::to_wide (size);
+
+ /* Ignore negative offsets for now. For others,
+ use the lower bound as the most optimistic
+ estimate of the (remaining)size. */
+ if (wi::sign_mask (min))
+ ;
+ else if (wi::ltu_p (min, wisiz))
+ return wide_int_to_tree (TREE_TYPE (size),
+ wi::sub (wisiz, min));
+ else
+ return size_zero_node;
+ }
+ }
+ }
+ }
+ else if (code != ADDR_EXPR)
+ return NULL_TREE;
}
+ /* Unless computing the largest size (for memcpy and other raw memory
+ functions), try to determine the size of the object from its type. */
+ if (!ostype)
+ return NULL_TREE;
+
if (TREE_CODE (dest) != ADDR_EXPR)
return NULL_TREE;
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow.c b/gcc/testsuite/gcc.dg/Wstringop-overflow.c
new file mode 100644
index 0000000..b5bd40e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow.c
@@ -0,0 +1,132 @@
+/* PR middle-end/77608 - missing protection on trivially detectable runtime
+ buffer overflow
+ { dg-do compile }
+ { dg-options "-O2 -Wstringop-overflow -ftrack-macro-expansion=0" } */
+
+#define SIZE_MAX __SIZE_MAX__
+#define DIFF_MAX __PTRDIFF_MAX__
+#define DIFF_MIN (-DIFF_MAX - 1)
+
+typedef __SIZE_TYPE__ size_t;
+
+extern void* memcpy (void*, const void*, size_t);
+extern char* strcpy (char*, const char*);
+extern char* strncpy (char*, const char*, size_t);
+
+void sink (void*);
+
+extern size_t unsigned_value (void)
+{
+ extern volatile size_t unsigned_value_source;
+ return unsigned_value_source;
+}
+
+size_t unsigned_range (size_t min, size_t max)
+{
+ size_t val = unsigned_value ();
+ return val < min || max < val ? min : val;
+}
+
+#define UR(min, max) unsigned_range (min, max)
+
+
+char a7[7];
+
+struct MemArray { char a9[9]; char a1[1]; };
+
+void test_memcpy_array (const void *s)
+{
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+ T (a7 + UR (0, 1), s, 7);
+ T (a7 + UR (0, 7), s, 7);
+ T (a7 + UR (0, 8), s, 7);
+ T (a7 + UR (0, DIFF_MAX), s, 7);
+ T (a7 + UR (0, SIZE_MAX), s, 7);
+
+ T (a7 + UR (1, 2), s, 7); /* { dg-warning "writing 7 bytes into a region of size 6" } */
+ T (a7 + UR (2, 3), s, 7); /* { dg-warning "writing 7 bytes into a region of size 5" } */
+ T (a7 + UR (6, 9), s, 7); /* { dg-warning "writing 7 bytes into a region of size 1" } */
+ T (a7 + UR (7, 9), s, 7); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+ T (a7 + UR (8, 9), s, 7); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+ T (a7 + UR (9, 10), s, 7); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+ T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 7); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+ T (a7 + UR (DIFF_MAX, SIZE_MAX), s, 7); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+ /* This is valid. */
+ char *d = a7 + 7;
+ T (d + UR (-8, -7), s, 7);
+}
+
+/* Verify the absence of warnings for memcpy writing beyond object
+ boundaries. */
+
+void test_memcpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+ /* The following are valid. */
+ T (p->a9 + UR (0, 1), s, 9);
+ T (p->a9 + UR (0, 7), s, 9);
+ T (p->a9 + UR (0, 8), s, 9);
+ T (p->a9 + UR (0, DIFF_MAX), s, 9);
+ T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+ /* The following are invalid. Unfortunately, there is apparently enough
+ code out there that abuses memcpy to write past the end of one member
+ and into the members that follow so the following are not diagnosed
+ by design. It sure would be nice not to have to cater to hacks like
+ these... */
+ T (p->a9 + UR (1, 2), s, 9);
+ T (p->a9 + UR (1, 2), s, 123);
+}
+
+
+void test_strcpy_array (void)
+{
+#undef T
+#define T(d, s) (strcpy ((d), (s)), sink (d))
+
+ T (a7 + UR (0, 1), "012345");
+ T (a7 + UR (0, 7), "012345");
+ T (a7 + UR (0, 8), "012345");
+ T (a7 + UR (0, DIFF_MAX), "012345");
+ T (a7 + UR (0, SIZE_MAX), "012345");
+
+ T (a7 + UR (1, 2), "012345"); /* { dg-warning "writing 7 bytes into a region of size 6" } */
+ T (a7 + UR (2, 3), "012345"); /* { dg-warning "writing 7 bytes into a region of size 5" } */
+ T (a7 + UR (6, 9), "012345"); /* { dg-warning "writing 7 bytes into a region of size 1" } */
+ T (a7 + UR (7, 9), "012345"); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+ T (a7 + UR (8, 9), "012345"); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+ T (a7 + UR (9, 10), "012345"); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+ T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), "012345"); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+ T (a7 + UR (DIFF_MAX, SIZE_MAX), "012345"); /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+ char *d = a7 + 7;
+
+ T (d + UR (-8, -7), "012345");
+}
+
+void test_strncpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (strncpy ((d), (s), (n)), sink (d))
+
+ T (p->a9 + UR (0, 1), s, 9);
+ T (p->a9 + UR (0, 7), s, 9);
+ T (p->a9 + UR (0, 8), s, 9);
+ T (p->a9 + UR (0, DIFF_MAX), s, 9);
+ T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+ T (p->a9 + UR (1, 2), s, 9); /* { dg-warning "writing 9 bytes into a region of size 8" } */
+ T (p->a9 + UR (2, 3), s, 9); /* { dg-warning "writing 9 bytes into a region of size 7" } */
+ T (p->a9 + UR (6, 9), s, 9); /* { dg-warning "writing 9 bytes into a region of size 3" } */
+ T (p->a9 + UR (9, 10), s, 9); /* { dg-warning "writing 9 bytes into a region of size 0" } */
+ T (p->a9 + UR (10, 11), s, 9); /* { dg-warning "writing 9 bytes into a region of size 0" } */
+
+ T (p->a9 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 1); /* { dg-warning "writing 1 byte into a region of size 0" } */
+ T (p->a9 + UR (DIFF_MAX, SIZE_MAX), s, 3); /* { dg-warning "writing 3 bytes into a region of size 0" } */
+}