On Tue, 2023-03-21 at 09:21 +0100, Pierrick Philippe wrote: > On 21/03/2023 00:30, David Malcolm wrote: > > On Mon, 2023-03-20 at 13:28 +0100, Pierrick Philippe wrote: > > > Hi everyone, > > > > > > I'm still playing around with the analyzer, and wanted to have a > > > look > > > at > > > loop handling. > > > I'm using a build from /trunk/ branch (/20230309/). > > > > > > Here is my analyzed code: > > > > > > ''' > > > 1| #include <stdlib.h> > > > 2| int main(void) { > > > 3| void * ptr = malloc(sizeof(int)); > > > 4| for (int i = 0; i < 10; i++) { > > > 5| if (i == 5) free(ptr); > > > 6| } > > > 7|} > > > ''' > [stripping] > > > So, I'm guessing that this false positive is due to how the > > > analyzer > > > is > > > handling loops. > > > Which lead to my question: how are loops handled by the analyzer? > > Sadly, the answer is currently "not very well" :/ > > > > I implemented my own approach, with a "widening_svalue" subclass of > > symbolic value. This is widening in the Abstract Interpretation > > sense, > > (as opposed to the bitwise operations sense): if I see multiple > > values > > on successive iterations, the widening_svalue tries to simulate > > that we > > know the start value and the direction the variable is moving in. > > > > This doesn't work well; arguably I should rewrite it, perhaps with > > an > > iterator_svalue, though I'm not sure how it ought to work. Some > > ideas: > > > > * reuse gcc's existing SSA-based loop analysis, which I believe can > > identify SSA names that are iterator variables, figure out their > > bounds, and their per-iteration increments, etc. > > > > * rework the program_point or supergraph code to have a notion of > > "1st > > iteration of loop", "2nd iteration of loop", "subsequent > > iterations", > > or similar, so that the analyzer can explore those cases > > differently > > (on the assumption that such iterations hopefully catch the most > > interesting bugs)
I've filed an RFE discussing some of the problems with -fanalyzer's loop-handling here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109252 including the idea of making use of GCC's existing SSA-based loop analysis (which discovers a tree of loops within each function's CFG). > > I see, I don't know if you ever considered allowing state machines to > deal with loops on their own. > Such as having an API to allow to register a callback to handle > loops, > but not in a mandatory way. > Or having a set of APIs to optionally implement for the analyzer to > call. I hadn't thought of that, but it sounds like a reasonable idea. > > It would allow state machines to analyze loops with the meaning of > their > inner analysis. > > Which could allow them to try to find a fixed point in the loop > execution which doesn't have > any impact on the program state for that state machine. Kind of like > a > custom loop invariant. > Because depending of the analysis goal of the state machine, you > might > need to symbolically execute the loop > only a few times before reentering the loop and having the entry > state > being the same as the end-of-loop state. The analyzer performs symbolic execution; it tries to achieve a reasonable balance between: * precision of state tracking versus * achieving decent coverage of code and data flow * ensuring termination via various heuristics. Its current loop implementation uses widening_svalue and the complexity limits on svalues/regions to attempt to have the symbolic execution terminate due to hitting already-visited nodes in the exploded_graph, or else hit per-program-point limits. Unfortuately this often doesn't work well. GCC's optimization code has both GIMPLE and RTL loop analysis code. The RTL code runs too late for the analyzer, but the GIMPLE loop analysis code is in cfgloop.{h,cc} and thus we would have access to information about loops, at least for well-behaved cases - though possibly only when optimization is enabled. > > In fact, this could be done directly by the analyzer, and only > calling > state machine APIs for loop handling which still has not reached > such a fixed point in their program state for the analyzed loop, with > a > maximum number of execution fixed by the analyzer to limit execution > time. > > Does what I'm saying make sense? I think so, though I'm not sure how it would work in practice. Consider e.g. for (int i = 0; i < n; i++) head = prepend_node (head, i); which builds a chain of N dynamically-allocated nodes in a linked list. > > In terms of implementation, loop detection can be done by looking for > strongly connected components (SCCs) > in a function graph having more than one node. > I don't know if this is how it is already done within the analyzer or > not? It isn't yet done in the analyzer, but as noted above there is code in GCC that already does that (in cfgloop.{h,cc}). Dave