On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote:
> On 15/03/2023 17:26, David Malcolm wrote:
> > On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:

[...snip...]

> > 
> > 
> > An ana::svalue is a pattern of bits (possibly symbolic, such as
> > "the
> > constant 42" or "the initial value of param x")
> > 
> > An ana::region is a description of a memory location for
> > reads/writes
> > (possibly symbolic, such as "local var x within this frame", or
> > "the
> > block reached by dereferencing the initial value of ptr in the
> > frame
> > above").
> > 
> > Sorry if I've misread things; I'll try to answer the rest of the
> > email
> > as best as I can...
> > 
> > > So, first question: is there any way to associate and track the
> > > state
> > > of
> > > a rvalue, independently of its lvalue?
> > > 
> > > To try to clarify the question, here's an example:
> > > 
> > > '''
> > > int __attribute__("__some_attribute__") x = 42;
> > > /* STEP-1
> > >   From now on, we consider the state of x as being marked by
> > > some_attribute.
> > > But in fact, in the log, we can observe that we'll have something
> > > like
> > > this in the new '/ana::program_state/':
> > > {0x4b955b0: (int)42: marked_state (‘x’)} */
> > Yes: sm-state is associated within a program_state with an svalue,
> > in
> > this case with the constant 42.
> > 
> > There isn't yet a way to instead give sm-state to "x" itself.  I
> > suppose you could give sm-state to &x (the pointer that points to x
> > is
> > an instance of region_svalue, which is an svalue), but I haven't
> > tried
> > this approach myself.
> > 
> > > int *y = &x;
> > > /* STEP-2
> > > For analysis purpose, you want to know that from now on, y is
> > > pointing
> > > to marked data.
> > > So you set state of the LHS of the GIMPLE statement (i.e. some
> > > ssa_name
> > > instance of y) accordingly, with a state named 'points-
> > > to_marked_data'
> > > and setting 'x' as the origin of the state (in the sense of the
> > > argument
> > > /origin/ from '/ana::sm_context::on_transition/'.
> > > What we now have in the new '/ana::program_state/' is this:
> > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> > > (int)42
> > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> > Yes: you've set the state on the svalue "&x", not on "y".
> > 
> > > int z = *y;
> > > /* STEP-3
> > > Now you have implicit copy of marked data, and you want to report
> > > it.
> > > So you state the LHS of the GIMPLE statement (i.e. some ssa_name
> > > instance of z) as being marked, with 'y' as the origin.
> > > What we now have in the new '/ana::program_state/' is this:
> > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> > > (int)42
> > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> > > '''
> > Presumably the program_state also shows that you have a binding for
> > the
> > region "z", containing the svalue 42 (this is represented within
> > the
> > "store", within the region_model within the program_state).
> 
> Indeed, there is a binding for region "z" to the svalue 42 within the
> new program state.
> 
> > > In STEP-2:
> > > 
> > > We lost the information saying that the rvalue of y (i.e. y), is
> > > pointing to marked data.
> > > Only remain the information that its lvalue (i.e. &x), is
> > > pointing to
> > > marked data, which is of course true.
> > Note that the analyzer by default attempts to purge state that it
> > doesn't think will be needed anymore, so it may have purged the
> > state
> > of "y" if y isn't going to be read from anymore.  You could try -
> > fno-
> > analyzer-state-purge to avoid this purging.
> 
> Nothing changes when I run it with the -fno-anlyzer-state-purge,
> it is still the state of &x which is tracked.
> 
> > > In STEP-3:
> > > 
> > > No information is added regarding z in the new program_state.
> > > In fact, if one have a closer look at the log, we see that the
> > > LHS of
> > > the GIMPLE statement (the ssa_name instance of z), is already in
> > > the
> > > state of 'marked_data'.
> > > Through the log to the call to '/sm_context::on_transition/' this
> > > can
> > > be
> > > seen:
> > >       marked_sm: state transition of ‘z_5’: marked_data ->
> > > marked_data
> > > 
> > > All of this step is somehow leading to confusing diagnostic.
> > > For example, you miss the fact that 'z' is being marked, because
> > > no
> > > event happen as it is somehow aliasing 'x' svalue.
> > > Though it might not be true in case of missed optimization.
> > > 
> > > Of course, if anyone wants to have a look at my code, I can
> > > provide
> > > it
> > > to you (it is not yet publicly available at the moment).
> > I think I'd have to look at the code to be of more help; I confess
> > that
> > I stopped understanding somewhere around step 3, sorry.
> No worries, the idea is that regarding the state_map, z_3 is already 
> considered as marked.
> > Are you able to post a simple example of the code you'd like to
> > analyze, with an idea of the behavior you'd like from the analyzer?
> > (e.g. the state transitions)
> 
> Sure here is an exemple:
> 
> '''
> int __attribute__("__some_attribute__") x = 42;
> // Transition state for x: start -> marked
> int *y = &x;
> // Transition state for y: start -> points-to_marked
> int z = *y;
> // Transition state for z: start -> marked
> if (z);
> // Diagnostic is issued because of the use of a marked data in a 
> condition, for analysis purpose.
> '''
> 
> The issued diagnostic I do have in mind looks like:
> 
> '''
> 1 | int __attribute__("__some_attribute__") x = 42;
>     | ^
>     | |
>     | (1) ‘x’ gets marked here
> 2 | int *y = &x;
>     |       ~
>     |        |
>     |      (2) ‘y’ points-to marked data here
> 3 | int z = *y;
>     |     ~
>     |      |
>     |    (3) ‘z’ gets marked here
> 4 | if (z);
>     |     ~
>     |      |
>     |    (4) used of a marked value 'z' in a condition
> '''
> 
> What I get from now on is this:
> 
> '''
> 1 |     int __attribute__((__taint__)) x = 42;
>     |                                                 ^
>     |                                                  |
>     |                                                (1) ‘x’ gets 
> tainted here
>     |......
> 4 |     if(z) return 666;
>     |        ~
>     |         |
>     |       (2) use of tainted value ‘z’ in a condition
> '''

If I'm reading things right, it looks like the analyzer is correctly
complaining that the marked svalue is being used in a condition, but
the problem is that the events in the diagnostic_path aren't very good:
it's missing events (2) and (3) from the "what you had in mind"
diagnostic.

If I'm right, then the issue here seems to be more about how we
generate the diagnostic_path (the list of event messages displayed for
the diagnostic) when we emit the saved_diagnostic, rather than about
the state-tracking.

Unfortunately, when I wrote that code I was primarily looking at state
machines for tracking allocation state (e.g. double-free), and it
doesn't work well with other kinds of state machine.  I've experimented
with other approaches that might work better.  I think ultimately we
want to be able to walk backwards from a saved_diagnostic and be able
to answer the question "why did it get into this state?", which might
involve following assignments backwards.  I have a semi-working
prototype of this, but it would be a lot of work to get ready for trunk
(it tries to record a set of reversible "change" objects, so that we
can walk backwards from the diagnostic, undoing changes to see where a
value came from).

> 
> And for some reason, if I change the analyzed code to this:
> 
> '''
> int __attribute__("__some_attribute__") x = 42;
> int *y = &x;
> x = 6; // New line
> int z = *y;
> if (z);
> '''
> 
> This is what I get:
> 
> '''
> 2 |     int __attribute__((__taint__)) x = 42;
>     |                                                 ^
>     |                                                  |
>     |                                                (1) ‘x’ gets
> marked 
> here
> 3 |     int * y = &x;
>     |            ~
>     |             |
>     |           (2) ‘&x’ points to tainted data here
> 4 |     x = 6;
> 5 |     int z = *y;
>     |          ~
>     |           |
>     |         (3) ‘x’ gets tainted here
> 6 |     if(z) return 666;
>     |        ~
>     |         |
>     |       (4) use of tainted value ‘z’ in a condition

That event (3) looks bogus; looks like a bug in how we generate events
(the existing version of the code I was talking about above).

> '''
> 
> Sorry for the long mail here,

No worries; sorry about the bugs in the analyzer :/

> Let me know if you want to have a look at the exploded graph or at
> the 
> analyzer's log.

Does the above help?

Dave

Reply via email to