Re: daily report on extending static analyzer project [GSoC]

David Malcolm via Gcc Wed, 07 Jul 2021 07:38:21 -0700

On Wed, 2021-07-07 at 19:22 +0530, Ankur Saini wrote:
> 
> 
> > On 07-Jul-2021, at 4:16 AM, David Malcolm <dmalc...@redhat.com>
> > wrote:
> > 
> > On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
> > > AIM for today : 
> > > 
> > > - update the call_stack to track something else other than
> > > supergraph
> > > edges
> > > 
> > > —
> > > 
> > > PROGRESS :
> > > 
> > > - After some brainstorming about tracking the callstack, I think
> > > one
> > > better way to track the call stack is to keep track of source and
> > > destination of the call instead of supperedges representing them.
> > > 
> > > - so I branched again and updated the call-string to now capture
> > > a pair
> > > of supernodes ( one representing callee and other representing
> > > caller
> > > ), like this I was not only able to easily port the entire code
> > > to
> > > adapt it without much difficulty, but now can also push calls to
> > > functions that doesn’t possess a superedge.
> > > 
> > > - changes done can be seen on the "
> > > refs/users/arsenic/heads/call_string_update “ branch. ( currently
> > > this
> > > branch doesn’t contain other changes I have done till now, as I
> > > wanted
> > > to test the new call-string representation exclusively and make
> > > sure it
> > > doesn’t affect the functionality of the base analyser )
> > 
> > I'm not an expert at git, so it took me a while to figure out how
> > to
> > access your branch.
> > 
> > It's easier for me if you can also use "git format-patch" to
> > generate a
> > patch and "git send-email" to send it to this list (and me,
> > please), so
> > that the patch content is going to the list.
> > 
> > The approach in the patch seems reasonable.
> > 
> > I think you may have a memory leak, though: you're changing
> > call_string
> > from:
> >  auto_vec<const return_superedge *> m_return_edges;
> > to:
> >  auto_vec<const std::pair<const supernode *,const supernode *>*>
> > m_supernodes;
> > 
> > and the std:pairs are being dynamically allocated on the heap.
> > Ownership gets transferred by call_string::operator=, but if I'm
> > reading the patch correctly never get deleted.  This is OK for
> > prototyping, but will need fixing before the code can be merged.
> 
> > 
> > It's probably simplest to get rid of the indirection and allocation
> > in
> > m_supernodes and have the std::pair be stored by value, rather than
> > by
> > pointer, i.e.:
> >  auto_vec<std::pair<const supernode *, const supernode *> >
> > m_supernodes;
> > 
> > Does that work? (or is there a problem I'm not thinking of).
> 
> yes, I noticed that while creating, was thinking to empty the vector
> and delete the all the memory in the destructor of the call-string (
> or make them unique pointers and let them destroy themselves ) but
> looks like storing the values of the pairs would make more sense.


Yes, just storing the std::pair rather than new/delete is much simpler.

There's also an auto_delete_vec<T> which stores (T *) as the elements
and deletes all of the elements in its dtor, but the assignment
operator/copy-ctor/move-assign/move-ctor probably don't work properly,
and the overhead of new/delete probably isn't needed.

> 
> > 
> > If that's a problem, I think you might be able to get away with
> > dropping the "first" from the pair, and simply storing the
> > supernode to
> > return to; I think the only places that "first" gets used are in
> > dumps
> > and in validation.  But "first" is probably helpful for debugging,
> > so
> > given that you've got it working with that field, better to keep
> > it.
> 
> yes, I see that too, but my idea is to keep it as is for now ( maybe
> it might turn out to be helpful in future ). I will change it back if
> it proves to be useless and we get time at the end.

Yes; my suggestion was just in case there were issues with fixing the
auto_vec.   It's better for debugging to have both of the pointers in
the element.

[...snip...]

> > > 
> > > I think you may have a memory leak, though: you're changing
> > > call_string
> > > from:
> > >   auto_vec<const return_superedge *> m_return_edges;
> > > to:
> > >   auto_vec<const std::pair<const supernode *,const supernode *>*>
> > > m_supernodes;
> > 
> > One other, unrelated idea that just occurred to me: those lines get
> > very long, so maybe introduce a typedef e.g. 
> >  typedef std::pair<const supernode *,const supernode *> element_t;
> > so that you can refer to the pairs as call_string::element_t, and
> > just
> > element_t when you're in call_string scope, and just have a:
> > 
> >  auto_vec<element_t> m_supernodes;
> > 
> > or
> > 
> >  auto_vec<element_t> m_elements; 
> > 
> > within the call_string, if that makes sense.  Does that simplify
> > things?
> 
> Yes, this is a nice idea, I will update the call-stack with next
> update to the analyzer, or should I update it and send a patch to the
> mailing list with this call_string changes for review first and then
> work on the other changes ?

I prefer reviewing code via emails to the mailing list, rather than
looking at it in the repository.  One benefit is that other list
subscribers (and archive readers) can easily see the code we're
discussing; this will become more significant as we go into the ipa-
devirt code which wasn't written by me.

That said, one benefit of having your own branch is so you don't have
to wait for review - you can prototype things and press on without
waiting, and not have to worry about everything being perfect
immediately.  Hopefully we can review the code frequently enough to
allow you to "course-correct" without delaying you.

> 
> also regarding the ChangeLog 
> 
> how exactly does one update the changelog, does it get updated with
> the commit messages or do one have to update it manually ? 
> 
> I found and read this doc (  
> https://gcc.gnu.org/codingconventions.html#ChangeLogs <  
> https://gcc.gnu.org/codingconventions.html#ChangeLogs>) about the
> same which says that “ ChangeLog entries are part of git commit
> messages and are automatically put into a corresponding ChangeLog
> file “. But when I pushed the commits which looks properly formatted
> to me, I was not able to see the changes reflecting back in 
> gcc/analyzer/changelog .

There's a script that runs every 24 hours on the main branches which
looks at the commits that have happened since the script last run,
tries to parse the commit messages to find ChangeLog entries, and adds
a commit adding those entries to the ChangeLog files.  I don't think it
runs on user branches.

We require ChangeLog entries when merging code to trunk and the release
branches, but it may be overkill for a personal development branch. 
When I'm working on a new feature, I only bother writing the ChangeLog
when a patch is nearly ready for trunk, since often I find that patches
need a fair amount of reworking as I test them.

The call_string patch looks nearly ready for trunk, and thus probably
needs a ChangeLog, but the ipa-devirt work is going to be more
experimental for now, so I wouldn't bother with ChangeLogs on it yet
until it's clearer what the changes will be and how to land them into
trunk.

I'm currently working on implementing detection of uses of
uninitialized values in -fanalyzer for GCC 12, and so I'm making my own
changes (mostly to region-model/store/constraint-manager).  Hopefully
we won't get too many clashes between our changes.

Hope this makes sense
Dave

Re: daily report on extending static analyzer project [GSoC]

Reply via email to