On Tue, 2022-08-09 at 23:19 +0200, Tim Lange wrote: > This patch adds an experimental out-of-bounds checker to the > analyzer. > > The checker was tested on coreutils, curl, httpd and openssh. It is > mostly > accurate but does produce false-positives on yacc-generated files and > sometimes when the analyzer misses an invariant. These cases will be > documented in bugzilla. > (Regrtests still running with the latest changes, will report back > later.)
Hi Tim, thanks for the patch, and for all the testing you've done on it. We've already had several rounds of review of this off-list, and this patch looks very close to ready. Some nits below... > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt > index 5021376b6fb..8e73af60ceb 100644 > --- a/gcc/analyzer/analyzer.opt > +++ b/gcc/analyzer/analyzer.opt > @@ -158,6 +158,10 @@ Wanalyzer-tainted-size > Common Var(warn_analyzer_tainted_size) Init(1) Warning > Warn about code paths in which an unsanitized value is used as a > size. > > +Wanalyzer-out-of-bounds > +Common Var(warn_analyzer_out_of_bounds) Init(1) Warning > +Warn about code paths in which a write or read to a buffer is out- > of-bounds. > + Please keep the list alphabetized; I think this needs to be between Wanalyzer-mismatching-deallocation and Wanalyzer-possible-null-argument > Wanalyzer-use-after-free > Common Var(warn_analyzer_use_after_free) Init(1) Warning > Warn about code paths in which a freed value is used. > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region- > model.cc > index f7df2fca245..2f9382ed96c 100644 > --- a/gcc/analyzer/region-model.cc > +++ b/gcc/analyzer/region-model.cc > @@ -1268,6 +1268,402 @@ region_model::on_stmt_pre (const gimple > *stmt, > } > } > > +/* Abstract base class for all out-of-bounds warnings. */ > + > +class out_of_bounds : public > pending_diagnostic_subclass<out_of_bounds> > +{ > +public: > + out_of_bounds (const region *reg, tree diag_arg, byte_range range) > + : m_reg (reg), m_diag_arg (diag_arg), m_range (range) > + {} > + > + const char *get_kind () const final override > + { > + return "out_of_bounds_diagnostic"; > + } > + > + bool operator== (const out_of_bounds &other) const > + { > + return m_reg == other.m_reg > + && m_range == other.m_range > + && pending_diagnostic::same_tree_p (m_diag_arg, > other.m_diag_arg); > + } > + > + int get_controlling_option () const final override > + { > + return OPT_Wanalyzer_out_of_bounds; > + } > + > + void mark_interesting_stuff (interesting_t *interest) final > override > + { > + interest->add_region_creation (m_reg); > + } > + > +protected: > + const region *m_reg; > + tree m_diag_arg; > + byte_range m_range; Please add a comment clarifying what the meaning of m_range is here. Is it (a) the range of all bytes that are accessed, (b) the range of bytes that are accessed out-of-bounds, (c) etc? >From my reading of the patch I think it's (b). > +}; > + > +/* Abstract subclass to complaing about out-of-bounds > + past the end of the buffer. */ > + > +class past_the_end : public out_of_bounds > +{ > +public: > + past_the_end (const region *reg, tree diag_arg, byte_range range, > + tree byte_bound) > + : out_of_bounds (reg, diag_arg, range), m_byte_bound (byte_bound) > + {} > + > + bool operator== (const past_the_end &other) const > + { > + return m_reg == other.m_reg > + && m_range == other.m_range > + && pending_diagnostic::same_tree_p (m_diag_arg, > other.m_diag_arg) Is it possible to call out_of_bounds::operator== for the first three fields, rather than a copy-and-paste of the logic? > + && pending_diagnostic::same_tree_p (m_byte_bound, > + other.m_byte_bound); > + } > + > + label_text > + describe_region_creation_event (const evdesc::region_creation &ev) > final > + override > + { > + if (m_byte_bound && TREE_CODE (m_byte_bound) == INTEGER_CST) > + return ev.formatted_print ("capacity is %E bytes", > m_byte_bound); > + > + return label_text (); > + } > + > +protected: > + tree m_byte_bound; > +}; [...snip the concrete subclasses...] We went through several rounds of review off-list, and I have lots of ideas for wording tweaks to the patch, but rather than me be a "backseat driver" (or bikeshedding), I think that that aspect of the patch is good enough as-is, and I'll make the wording changes myself once the patch is in trunk. [...snip...] > + > + if (warned) > + { > + char num_bytes_past_buf[WIDE_INT_PRINT_BUFFER_SIZE]; > + print_dec (m_range.m_size_in_bytes, num_bytes_past_buf, > UNSIGNED); I think we can use %wu for this, but I can fix this up in a followup. [...snip...] > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index fa23fbeaaaa..5ab834af780 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -459,6 +459,7 @@ Objective-C and Objective-C++ Dialects}. > -Wno-analyzer-null-dereference @gol > -Wno-analyzer-possible-null-argument @gol > -Wno-analyzer-possible-null-dereference @gol > +-Wno-analyzer-out-of-bounds @gol Please move between -Wno-analyzer-null-dereference @gol and -Wno-analyzer-possible-null-argument @gol for alphabetization. > -Wno-analyzer-shift-count-negative @gol > -Wno-analyzer-shift-count-overflow @gol > -Wno-analyzer-stale-setjmp-buffer @gol > @@ -9991,6 +9992,17 @@ This warning requires @option{-fanalyzer}, > which enables it; use > This diagnostic warns for paths through the code in which a > value known to be NULL is dereferenced. > > +@item -Wno-analyzer-out-of-bounds > +@opindex Wanalyzer-out-of-bounds > +@opindex Wno-analyzer-out-of-bounds > +This warning requires @option{-fanalyzer} to enable it; use > +@option{-Wno-analyzer-out-of-bounds} to disable it. > + > +This diagnostic warns for path through the code in which a buffer is > +accessed or written out-of-bounds. Would be good to clarify the limitations: as I understand it: "The diagnostic only applies for cases where the analyzer is able to determine a constant size for the buffer. It warns when any part of a read or write is definitely before the start of the buffer, or definitely after the end." ...or somesuch wording. > + > +See @url{https://cwe.mitre.org/data/definitions/119.html, CWE-119: > Improper Restriction of Operations within the Bounds of a Memory > Buffer}. Also, please move the new entry to position to keep things alphabetized. > + > @item -Wno-analyzer-shift-count-negative > @opindex Wanalyzer-shift-count-negative > @opindex Wno-analyzer-shift-count-negative [...snip...] > diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c > b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c > new file mode 100644 > index 00000000000..715c8b7460f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c > @@ -0,0 +1,119 @@ > +#include <stdlib.h> > +#include <string.h> > +#include <stdint.h> > +#include <stdio.h> > + > +/* Wanalyzer-out-of-bounds tests for buffer overflows. */ > + > +/* Avoid folding of memcpy. */ > +typedef void * (*memcpy_t) (void *dst, const void *src, size_t n); > + > +static memcpy_t __attribute__((noinline)) > +get_memcpy (void) > +{ > + return memcpy; > +} > + > + > +/* Taken from CWE-787. */ > +void test1 (void) > +{ > + int id_sequence[3]; > + > + id_sequence[0] = 123; > + id_sequence[1] = 234; > + id_sequence[2] = 345; > + id_sequence[3] = 456; /* { dg-line test1 } */ > + > + /* { dg-warning "overflow" "warning" { target *-*-* } test1 } */ > + /* { dg-message "" "note" { target *-*-* } test1 } */ I see that you've left the regexes mostly blank in the various DejaGnu directives in these new tests. Normally I'd want these to be less vague, but given that I plan to change the wordings in a followup anyway, this is OK. [...snip lots of great testcases...] With the above nits fixed, the patch is OK for trunk (assuming that your testing doesn't show any problems). Thanks again for the patch; this feels like a major new feature. Dave