On Mon, 2022-09-05 at 23:16 +0200, Tim Lange wrote: > Hi, > > below is my patch, adding support for reasoning about buffer > overflows and > overreads with symbolic offsets and capacities.
Thanks for the updated patch. Various comments inline below... > > I've already had one off-list feedback from Dave after sending him my > preliminary work. Below, I'll be also answering some of the questions > that > came up during the first round. > > To reason about out-of-bounds with symbolic values, I have decided to > do > some simplifications: > * Only reason in bytes, i.e. loosing some bits on bit-field accesses > and > bit ranges in the conversion. > * Not trying to handle commutativeness [0]. > * Require similar structure of the offset and the capacity svalue to > complain about symbolic out-of-bounds. > * A symbolic leaf operand [1] is positive if its type is unsigned. I > do > ignore that the operand could be zero. It wouldn't make much sense > to have an offset that is always zero, but is not inferable > statically > such that the approxmiation here would yield a false-positive. In > order > to fully prevent the false-positive, we would have to give up many > true > positives. Dave also thinks that is a reasonable approximation. (nods) > > > Whats the slowdown of this patch? > I found another optimization that tries to skip the structural > equality > check by trying referential equality (aka comparing pointers) first. > That > seems to do the trick and at least in my single run of compiling > coreutils, curl, httpd and openssh with the current master and my > patch > enabled, I've found little to no overhead, at most ~5s CPU time [2]. > > > Why was the realloc implementation changed? > With the patch, the analyzer can reason about simple inequalities of > svalues. The previous way of getting the smaller of the current > buffer > size and the new buffer size was less accurate and lead to a false > positive because it chose the wrong svalue. The change fixes that by > using the same comparison function as the out-of-bounds checker. > Also, I > changed it to return the OLD_SIZE_SVAL by default, because I think I > had > a thinking error in my previous patch: Realloc implementations > probably > only move the buffer if the buffer grows. That means the old buffer > is > copied to the new one, only touching as many bytes as the old buffer > had. > The rest of the bytes is left uninitialized. > > I added [WIP?], because the regrtests are not yet finished but a > similar > version did pass them, so I assume thats okay to post it now for > review > and hand in the regrtests later [3]. > > - Tim > > [0] I have tried that earlier but it turned out to be too slow. > [1] leaf == conjured, inital or constant svalue. > [2] Note that I didn't run multiple tests and the compile farm is not > isolated and I haven't done anything about caching etc. But at > least > the results show that there is no heavy slowdown. That's fine. > [3] The analyzer and analyzer-torture tests pass on my machine for > C/C++. > > This patch adds support for reasoning about the inequality of two > symbolic > values in the special case specifically suited for reasoning about > out-of-bounds past the end of the buffer. With this patch, the > analyzer > catches off-by-one errors and more even when the offset and capacity > is > symbolic. > > Tested on coreutils, curl, httpd and openssh. [...snip...] > + label_text > + describe_final_event (const evdesc::final_event &ev) final > override > + { > + const char *byte_str; > + if (pending_diagnostic::same_tree_p (m_num_bytes, > integer_one_node)) > + byte_str = "byte"; > + else > + byte_str = "bytes"; > + As noted in the offlist review, I want to do a rewording of all of these diagnostics, but that can be a followup (for PR 106626), either by me or you, so I'm not going to be fussy about this part of the patch (e.g. the i18n issues). [...snip...] > + > +/* Check whether an access is past the end of the BASE_REG. */ > + > +void region_model::check_symbolic_bounds (const region *base_reg, > + const svalue > *sym_byte_offset, > + const svalue > *num_bytes_sval, > + const svalue *capacity, > + enum access_direction dir, > + region_model_context *ctxt) > const > +{ > + gcc_assert (ctxt); > + > + const svalue *next_byte > + = m_mgr->get_or_create_binop (num_bytes_sval->get_type (), > PLUS_EXPR, > + sym_byte_offset, num_bytes_sval); > + > + if (eval_condition_without_cm (next_byte, GT_EXPR, > capacity).is_true ()) It occurs to me that we could use region_model::eval_condition here, rather than region_model::eval_condition_without_cm, which would allow the comparison to also take account of known constraints - this perhaps could catch cases where the user got the sense of a comparison wrong, e.g. if they write if (idx > size) arr[idx] = val; when they meant: if (idx < size) arr[idx] = val; But that can be saved for a (possible) followup, given that it might be slower, and that all of your testing has been with the code as written, and I think the above mistake is relatively unlikely compared to the kinds of off-by-one mistakes that the patch tests for. [...snip...] > +/* Return true if A is definitely larger than B. > + > + Limitation: does not account for integer overflows and does not > try to > + return false, so it can not be used negated. */ > + > +tristate > +region_model::symbolic_greater_than (const binop_svalue *bin_a, > + const svalue *b) const > +{ > + /* Eliminate the right-hand side of both svalues. */ > + if (const binop_svalue *bin_b = dyn_cast <const binop_svalue *> > (b)) > + if (bin_a->get_op () == bin_b->get_op () > + && eval_condition_without_cm (bin_a->get_arg1 (), > + GT_EXPR, > + bin_b->get_arg1 ()).is_true () > + && eval_condition_without_cm (bin_a->get_arg0 (), > + GE_EXPR, > + bin_b->get_arg0 ()).is_true ()) > + return tristate (tristate::TS_TRUE); > + > + /* Otherwise, try to remove a positive offset or factor only from > BIN_A. */ > + if ((bin_a->get_op () == PLUS_EXPR || bin_a->get_op () == > MULT_EXPR) > + && is_positive_svalue (bin_a->get_arg1 ())) > + if (eval_condition_without_cm (bin_a->get_arg0 (),GE_EXPR, > b).is_true ()) > + return tristate (tristate::TS_TRUE); > + > + return tristate::unknown (); > +} Again, I wonder if this should be using region_model::eval_condition rather than region_model::eval_condition_without_cm - but investigating that can be saved for a followup. [...snip...] The posted patch is OK to push to trunk (assuming that your testing passes). Thanks - this looks like a nice improvement to the existing warning. Dave