On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote: > On 15/03/2023 17:26, David Malcolm wrote: > > On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:
[...snip...] > > > > > > An ana::svalue is a pattern of bits (possibly symbolic, such as > > "the > > constant 42" or "the initial value of param x") > > > > An ana::region is a description of a memory location for > > reads/writes > > (possibly symbolic, such as "local var x within this frame", or > > "the > > block reached by dereferencing the initial value of ptr in the > > frame > > above"). > > > > Sorry if I've misread things; I'll try to answer the rest of the > > email > > as best as I can... > > > > > So, first question: is there any way to associate and track the > > > state > > > of > > > a rvalue, independently of its lvalue? > > > > > > To try to clarify the question, here's an example: > > > > > > ''' > > > int __attribute__("__some_attribute__") x = 42; > > > /* STEP-1 > > > From now on, we consider the state of x as being marked by > > > some_attribute. > > > But in fact, in the log, we can observe that we'll have something > > > like > > > this in the new '/ana::program_state/': > > > {0x4b955b0: (int)42: marked_state (‘x’)} */ > > Yes: sm-state is associated within a program_state with an svalue, > > in > > this case with the constant 42. > > > > There isn't yet a way to instead give sm-state to "x" itself. I > > suppose you could give sm-state to &x (the pointer that points to x > > is > > an instance of region_svalue, which is an svalue), but I haven't > > tried > > this approach myself. > > > > > int *y = &x; > > > /* STEP-2 > > > For analysis purpose, you want to know that from now on, y is > > > pointing > > > to marked data. > > > So you set state of the LHS of the GIMPLE statement (i.e. some > > > ssa_name > > > instance of y) accordingly, with a state named 'points- > > > to_marked_data' > > > and setting 'x' as the origin of the state (in the sense of the > > > argument > > > /origin/ from '/ana::sm_context::on_transition/'. > > > What we now have in the new '/ana::program_state/' is this: > > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: > > > (int)42 > > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ > > Yes: you've set the state on the svalue "&x", not on "y". > > > > > int z = *y; > > > /* STEP-3 > > > Now you have implicit copy of marked data, and you want to report > > > it. > > > So you state the LHS of the GIMPLE statement (i.e. some ssa_name > > > instance of z) as being marked, with 'y' as the origin. > > > What we now have in the new '/ana::program_state/' is this: > > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: > > > (int)42 > > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ > > > ''' > > Presumably the program_state also shows that you have a binding for > > the > > region "z", containing the svalue 42 (this is represented within > > the > > "store", within the region_model within the program_state). > > Indeed, there is a binding for region "z" to the svalue 42 within the > new program state. > > > > In STEP-2: > > > > > > We lost the information saying that the rvalue of y (i.e. y), is > > > pointing to marked data. > > > Only remain the information that its lvalue (i.e. &x), is > > > pointing to > > > marked data, which is of course true. > > Note that the analyzer by default attempts to purge state that it > > doesn't think will be needed anymore, so it may have purged the > > state > > of "y" if y isn't going to be read from anymore. You could try - > > fno- > > analyzer-state-purge to avoid this purging. > > Nothing changes when I run it with the -fno-anlyzer-state-purge, > it is still the state of &x which is tracked. > > > > In STEP-3: > > > > > > No information is added regarding z in the new program_state. > > > In fact, if one have a closer look at the log, we see that the > > > LHS of > > > the GIMPLE statement (the ssa_name instance of z), is already in > > > the > > > state of 'marked_data'. > > > Through the log to the call to '/sm_context::on_transition/' this > > > can > > > be > > > seen: > > > marked_sm: state transition of ‘z_5’: marked_data -> > > > marked_data > > > > > > All of this step is somehow leading to confusing diagnostic. > > > For example, you miss the fact that 'z' is being marked, because > > > no > > > event happen as it is somehow aliasing 'x' svalue. > > > Though it might not be true in case of missed optimization. > > > > > > Of course, if anyone wants to have a look at my code, I can > > > provide > > > it > > > to you (it is not yet publicly available at the moment). > > I think I'd have to look at the code to be of more help; I confess > > that > > I stopped understanding somewhere around step 3, sorry. > No worries, the idea is that regarding the state_map, z_3 is already > considered as marked. > > Are you able to post a simple example of the code you'd like to > > analyze, with an idea of the behavior you'd like from the analyzer? > > (e.g. the state transitions) > > Sure here is an exemple: > > ''' > int __attribute__("__some_attribute__") x = 42; > // Transition state for x: start -> marked > int *y = &x; > // Transition state for y: start -> points-to_marked > int z = *y; > // Transition state for z: start -> marked > if (z); > // Diagnostic is issued because of the use of a marked data in a > condition, for analysis purpose. > ''' > > The issued diagnostic I do have in mind looks like: > > ''' > 1 | int __attribute__("__some_attribute__") x = 42; > | ^ > | | > | (1) ‘x’ gets marked here > 2 | int *y = &x; > | ~ > | | > | (2) ‘y’ points-to marked data here > 3 | int z = *y; > | ~ > | | > | (3) ‘z’ gets marked here > 4 | if (z); > | ~ > | | > | (4) used of a marked value 'z' in a condition > ''' > > What I get from now on is this: > > ''' > 1 | int __attribute__((__taint__)) x = 42; > | ^ > | | > | (1) ‘x’ gets > tainted here > |...... > 4 | if(z) return 666; > | ~ > | | > | (2) use of tainted value ‘z’ in a condition > ''' If I'm reading things right, it looks like the analyzer is correctly complaining that the marked svalue is being used in a condition, but the problem is that the events in the diagnostic_path aren't very good: it's missing events (2) and (3) from the "what you had in mind" diagnostic. If I'm right, then the issue here seems to be more about how we generate the diagnostic_path (the list of event messages displayed for the diagnostic) when we emit the saved_diagnostic, rather than about the state-tracking. Unfortunately, when I wrote that code I was primarily looking at state machines for tracking allocation state (e.g. double-free), and it doesn't work well with other kinds of state machine. I've experimented with other approaches that might work better. I think ultimately we want to be able to walk backwards from a saved_diagnostic and be able to answer the question "why did it get into this state?", which might involve following assignments backwards. I have a semi-working prototype of this, but it would be a lot of work to get ready for trunk (it tries to record a set of reversible "change" objects, so that we can walk backwards from the diagnostic, undoing changes to see where a value came from). > > And for some reason, if I change the analyzed code to this: > > ''' > int __attribute__("__some_attribute__") x = 42; > int *y = &x; > x = 6; // New line > int z = *y; > if (z); > ''' > > This is what I get: > > ''' > 2 | int __attribute__((__taint__)) x = 42; > | ^ > | | > | (1) ‘x’ gets > marked > here > 3 | int * y = &x; > | ~ > | | > | (2) ‘&x’ points to tainted data here > 4 | x = 6; > 5 | int z = *y; > | ~ > | | > | (3) ‘x’ gets tainted here > 6 | if(z) return 666; > | ~ > | | > | (4) use of tainted value ‘z’ in a condition That event (3) looks bogus; looks like a bug in how we generate events (the existing version of the code I was talking about above). > ''' > > Sorry for the long mail here, No worries; sorry about the bugs in the analyzer :/ > Let me know if you want to have a look at the exploded graph or at > the > analyzer's log. Does the above help? Dave