On Thu, Sep 7, 2023 at 1:28 PM David Malcolm <dmalc...@redhat.com> wrote:
> On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote: > > > Hi Dave, > > Hi Eric, thanks for the patch. > > > > > Recently I've been working on symbolic value support for the reference > > count checker. I've attached a patch for it below; let me know it looks > > OK for trunk. Thanks! > > > > Best, > > Eric > > > > --- > > > > This patch enhances the reference count checker in the CPython plugin by > > adding support for symbolic values. Whereas previously we were only able > > to check the reference count of PyObject* objects created in the scope > > of the function; we are now able to emit diagnostics on reference count > > mismatch of objects that were, for example, passed in as a function > > parameter. > > > > rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but > ob_refcnt field is N + ‘2’ > > 6 | return obj; > > | ^~~ > > [...snip...] > > > create mode 100644 > gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > > > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > index bf1982e79c3..d7ecd7fce09 100644 > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > @@ -314,17 +314,20 @@ public: > > { > > diagnostic_metadata m; > > bool warned; > > - // just assuming constants for now > > - auto actual_refcnt > > - = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant (); > > - auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue > ()->get_constant (); > > - warned = warning_meta (rich_loc, m, get_controlling_option (), > > - "expected %qE to have " > > - "reference count: %qE but ob_refcnt field is: > %qE", > > - m_reg_tree, actual_refcnt, ob_refcnt); > > - > > - // location_t loc = rich_loc->get_loc (); > > - // foo (loc); > > + > > + const auto *actual_refcnt_constant > > + = m_actual_refcnt->dyn_cast_constant_svalue (); > > + const auto *ob_refcnt_constant = > m_ob_refcnt->dyn_cast_constant_svalue (); > > + if (!actual_refcnt_constant || !ob_refcnt_constant) > > + return false; > > + > > + auto actual_refcnt = actual_refcnt_constant->get_constant (); > > + auto ob_refcnt = ob_refcnt_constant->get_constant (); > > + warned = warning_meta ( > > + rich_loc, m, get_controlling_option (), > > + "expected %qE to have " > > + "reference count: N + %qE but ob_refcnt field is N + %qE", > > + m_reg_tree, actual_refcnt, ob_refcnt); > > return warned; > > I know you're emulating the old behavior I implemented way back in > cpychecker, but I don't like that behavior :( > > Specifically, although the patch improves the behavior for symbolic > values, it regresses the precision of wording for the concrete values > case. If we have e.g. a concrete ob_refcnt of 2, whereas we only have > 1 pointer, then it's more readable to say: > > warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt > field is ‘2’ > > than: > > warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt > field is N + ‘2’ > > ...and we shouldn't quote concrete numbers, the message should be: > > warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field > is 2 > or better: > > warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2 > > > Can you move the unwrapping of the svalue from the tests below into the > emit vfunc? That way the m_actual_refcnt doesn't have to be a > constant_svalue; you could have logic in the emit vfunc to print > readable messages based on what kind of svalue it is. > > Rather than 'N', it might be better to say 'initial'; how about: > > warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt' > field has increased by 1 > warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' > field has increased by 2 > warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' > field is unchanged > warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt' > field has decreased by 1 > warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field > is unchanged > > and similar? > That makes sense to me as well (indeed I was just emulating the old behavior)! Will experiment and keep you posted on a revised patch with this in mind. This is somewhat of a minor detail but can we emit ‘*obj’ as bolded text in the diagnostic message? Currently, I can emit this (including the asterisk) like so: '*%E'. But unlike using %qE, it doesn't bold the body of the single quotations. Is this possible? > > Maybe have a flag that tracks whether we're talking about a concrete > value that's absolute versus a concrete value that's relative to the > initial value? > > > [...snip...] > > > > @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *, > int> &map, const region *key) > > refcnt = existed ? refcnt + 1 : 1; > > } > > > > +static const region * > > +get_region_from_svalue (const svalue *sval, region_model_manager *mgr) > > +{ > > + const auto *region_sval = sval->dyn_cast_region_svalue (); > > + if (region_sval) > > + return region_sval->get_pointee (); > > + > > + const auto *initial_sval = sval->dyn_cast_initial_svalue (); > > + if (initial_sval) > > + return mgr->get_symbolic_region (initial_sval); > > + > > + return nullptr; > > +} > > This is dereferencing a pointer, right? > > Can the caller use region_model::deref_rvalue instead? > > > [...snip...] > > > +static void > > +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval) > > +{ > > + if (ob_refcnt_sval->get_kind () != SK_CONSTANT) > > + { > > + auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast (); > > + if (!unwrap_cast) > > + unwrap_cast = ob_refcnt_sval; > > + > > + if (unwrap_cast->get_kind () == SK_BINOP) > > + ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 > (); > > This would be better spelled as: > > if (const binop_svalue *binop_sval = > unwrap_cast->dyn_cast_binop_svalue ()) > ob_refcnt_sval = binop_sval->get_arg1 (); > > [...snip...] > > > /* Compare ob_refcnt field vs the actual reference count of a region */ > > static void > > check_refcnt (const region_model *model, > > const region_model *old_model, > > region_model_context *ctxt, > > const hash_map<const ana::region *, > > - int>::iterator::reference_pair region_refcnt) > > + int>::iterator::reference_pair ®ion_refcnt) > > Could really use a typedef for > const hash_map<const ana::region *, int> > to simplify this code. > > > { > > region_model_manager *mgr = model->get_manager (); > > const auto &curr_region = region_refcnt.first; > > const auto &actual_refcnt = region_refcnt.second; > > + > > const svalue *ob_refcnt_sval > > = retrieve_ob_refcnt_sval (curr_region, model, ctxt); > > + if (!ob_refcnt_sval) > > + return; > > + > > + unwrap_any_ob_refcnt_sval (ob_refcnt_sval); > > As noted above, can the diagnostic store the pre-unwrapped > ob_refcnt_sval? Might mean you have to do the unwrapping both here, > and later when displaying the diagnostic. Or (probably best) track > both the original and unwrapped ob_refcnt_sval, and store both in the > pending_diagnostic. > > > + > > const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst ( > > ob_refcnt_sval->get_type (), actual_refcnt); > > - > > if (ob_refcnt_sval != actual_refcnt_sval) > > - { > > - const svalue *curr_reg_sval > > - = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region); > > - tree reg_tree = old_model->get_representative_tree > (curr_reg_sval); > > - if (!reg_tree) > > - return; > > - > > - const auto &eg = ctxt->get_eg (); > > - refcnt_stmt_finder finder (*eg, reg_tree); > > - auto pd = make_unique<refcnt_mismatch> (curr_region, > ob_refcnt_sval, > > - actual_refcnt_sval, > reg_tree); > > - if (pd && eg) > > - ctxt->warn (std::move (pd), &finder); > > - } > > + handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval, > > + actual_refcnt_sval, ctxt); > > } > > > > static void > > @@ -493,8 +520,6 @@ count_all_references (const region_model *model, > > for (const auto &cluster : *model->get_store ()) > > { > > auto curr_region = cluster.first; > > - if (curr_region->get_kind () != RK_HEAP_ALLOCATED) > > - continue; > > > > increment_region_refcnt (region_to_refcnt, curr_region); > > > > @@ -505,8 +530,8 @@ count_all_references (const region_model *model, > > > > const svalue *unwrapped_sval > > = binding_sval->unwrap_any_unmergeable (); > > - // if (unwrapped_sval->get_type () != pyobj_ptr_tree) > > - // continue; > > + if (unwrapped_sval->get_type () != pyobj_ptr_tree) > > + continue; > > We'll probably want a smarter test for this, that the type "inherits" > C-style from PyObject (e.g. PyLongObject). > > > > > > const region *pointee = unwrapped_sval->maybe_get_region (); > > if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED) > > diff --git > a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > > index e1efd9efda5..46daf2f8975 100644 > > --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > > @@ -75,4 +75,23 @@ test_PyListAppend_6 () > > PyObject *list = NULL; > > PyList_Append(list, item); > > return list; > > -} > > \ No newline at end of file > > +} > > + > > +PyObject * > > +test_PyListAppend_7 (PyObject *item) > > +{ > > + PyObject *list = PyList_New (0); > > + Py_INCREF(item); > > + PyList_Append(list, item); > > + return list; > > + /* { dg-warning "expected 'item' to have reference count" "" { target > *-*-* } .-1 } */ > > It would be good if these dg-warning directives regexp contained the > actual and expected counts; I find I can't easily tell what the > intended output is meant to be. > > > > +} > > + > > +PyObject * > > +test_PyListAppend_8 (PyObject *item, PyObject *list) > > +{ > > + Py_INCREF(item); > > + Py_INCREF(item); > > + PyList_Append(list, item); > > + return list; > > +} > > Should we complain here about item->ob_refcnt being too high? > > > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > > new file mode 100644 > > index 00000000000..a7f39509d6d > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target analyzer } */ > > +/* { dg-options "-fanalyzer" } */ > > +/* { dg-require-python-h "" } */ > > + > > + > > +#define PY_SSIZE_T_CLEAN > > +#include <Python.h> > > +#include "../analyzer/analyzer-decls.h" > > + > > +PyObject * > > +test_refcnt_1 (PyObject *obj) > > +{ > > + Py_INCREF(obj); > > + Py_INCREF(obj); > > + return obj; > > + /* { dg-warning "expected 'obj' to have reference count" "" { target > *-*-* } .-1 } */ > > Likewise, it would be better for the dg-warning directive's expressed > the expected "actual vs expected" values. > > [...snip...] > > Thanks again for the patch, hope this is constructive > > Dave > >