Re: Static Analyzer: correlate state for different region/svalue for reporting

David Malcolm via Gcc Wed, 15 Mar 2023 09:27:17 -0700

On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:
> Hi everyone,


Hi Pierrick

> 
> I have some question regarding the analyzer.
> I'm currently working on an fully out-of-tree static analyzer plugin.
> I started development on commit tagged as /basepoints/gcc-13/, but 
> recently moved my code to a more recent /trunk/ branch (build
> /20230309/).
> 
>  From my different experiments and analyzer's log readings, I have
> the 
> intuition that '/ana::state_machine::state/' class is mostly (if not 
> only) linked with '/ana::svalue/', 

Yes: a program_state (which describes the state at a particular node
along an execution path) contains (among other things) an sm_state_map
per state_machine, and the sm_state_map maps from svalues to
state_machine::states.

> i.e. the lvalue of a memory region.
> In my analysis case, I would like to also be able to track state for 
> some rvalue of a memory region, i.e. '/ana::region/' objects.

You're using the terms "lvalue" and "rvalue", but if I'm reading your
email right, you're using them in exactly the opposite way that I
understand them.

I think of an "lvalue" as something that can be on the left-hand side
of an assignment: something that can be written to (an ana::region),
whereas an "rvalue" is something that can be on the right-hand side of
an assignment: a pattern of bits (an ana:svalue) that be written into
an lvalue.

An ana::svalue is a pattern of bits (possibly symbolic, such as "the
constant 42" or "the initial value of param x")

An ana::region is a description of a memory location for reads/writes
(possibly symbolic, such as "local var x within this frame", or "the
block reached by dereferencing the initial value of ptr in the frame
above").

Sorry if I've misread things; I'll try to answer the rest of the email
as best as I can...

> So, first question: is there any way to associate and track the state
> of 
> a rvalue, independently of its lvalue?
> 
> To try to clarify the question, here's an example:
> 
> '''
> int __attribute__("__some_attribute__") x = 42;
> /* STEP-1
>  From now on, we consider the state of x as being marked by
> some_attribute.
> But in fact, in the log, we can observe that we'll have something
> like 
> this in the new '/ana::program_state/':
> {0x4b955b0: (int)42: marked_state (‘x’)} */

Yes: sm-state is associated within a program_state with an svalue, in
this case with the constant 42.

There isn't yet a way to instead give sm-state to "x" itself.  I
suppose you could give sm-state to &x (the pointer that points to x is
an instance of region_svalue, which is an svalue), but I haven't tried
this approach myself.

> 
> int *y = &x;
> /* STEP-2
> For analysis purpose, you want to know that from now on, y is
> pointing 
> to marked data.
> So you set state of the LHS of the GIMPLE statement (i.e. some
> ssa_name 
> instance of y) accordingly, with a state named 'points-
> to_marked_data'
> and setting 'x' as the origin of the state (in the sense of the
> argument 
> /origin/ from '/ana::sm_context::on_transition/'.
> What we now have in the new '/ana::program_state/' is this:
> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> (int)42 
> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */

Yes: you've set the state on the svalue "&x", not on "y".

> 
> int z = *y;
> /* STEP-3
> Now you have implicit copy of marked data, and you want to report it.
> So you state the LHS of the GIMPLE statement (i.e. some ssa_name 
> instance of z) as being marked, with 'y' as the origin.
> What we now have in the new '/ana::program_state/' is this:
> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> (int)42 
> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> '''

Presumably the program_state also shows that you have a binding for the
region "z", containing the svalue 42 (this is represented within the
"store", within the region_model within the program_state).

> 
> In STEP-2:
> 
> We lost the information saying that the rvalue of y (i.e. y), is 
> pointing to marked data.
> Only remain the information that its lvalue (i.e. &x), is pointing to
> marked data, which is of course true.

Note that the analyzer by default attempts to purge state that it
doesn't think will be needed anymore, so it may have purged the state
of "y" if y isn't going to be read from anymore.  You could try -fno-
analyzer-state-purge to avoid this purging.

> 
> In STEP-3:
> 
> No information is added regarding z in the new program_state.
> In fact, if one have a closer look at the log, we see that the LHS of
> the GIMPLE statement (the ssa_name instance of z), is already in the 
> state of 'marked_data'.
> Through the log to the call to '/sm_context::on_transition/' this can
> be 
> seen:
>      marked_sm: state transition of ‘z_5’: marked_data -> marked_data
> 
> All of this step is somehow leading to confusing diagnostic.
> For example, you miss the fact that 'z' is being marked, because no 
> event happen as it is somehow aliasing 'x' svalue.
> Though it might not be true in case of missed optimization.
> 
> Of course, if anyone wants to have a look at my code, I can provide
> it 
> to you (it is not yet publicly available at the moment).

I think I'd have to look at the code to be of more help; I confess that
I stopped understanding somewhere around step 3, sorry.

Are you able to post a simple example of the code you'd like to
analyze, with an idea of the behavior you'd like from the analyzer?
(e.g. the state transitions)

> Also, let me know if there is any imprecise information.
> 
> Thank you for your time,

Thanks; hope the above is helpful
Dave

Re: Static Analyzer: correlate state for different region/svalue for reporting

Reply via email to