steakhal added a comment.
First and foremost, I want to deeply apologize about my rushed response. When I
say that the `Taint originated here` note remained, I **wrongly** draw
conclusions. Won't happen again.
__Taint-flow IDs__:
I would challenge that we need a flow ID (number) because for me the tainted
symbol already has a unique ID which should be suitable for this purpose.
What I can see is that it would be useful to know directly what was the first
tainted symbols of any given tainted symbol. (`SymbolData -> SymbolData`
mapping)
This is pretty much analog with what you implemented by piggybacking on the
taint kind. Although, I'd argue that having an explicit mapping would be
cleaner, but I can see that it's more on personal preference.
In addition to that, I think there is value in minimizing the number of places
where we introduce such static counters, so I would adwise against that unless
we have a good reason for doing so.
I'm thinking that although it's handy to have the originally tainted symbol
directly at the error-node at hand, I'm still not sure if that couldn't be
calculated and tracked back to the place where we introduced taint.
int n;
scanf("%d", &n); // binds $1, not important
int v = mytaintsource(n); // taint originated here, returns $2
int z = taintprop(v); // taint propagated here, returns $3
int x = 42 / z; // 42 div $3
__Soundness of the back-propagation__:
The back-propagation is always in-sync **given that** the state-transition of
introducing taint **also attaches** the `NoteTag` explaining what it did and
why.
This basically means that after we call `addTaint()` we must also add a
`NoteTag` when we call `addTransition()`. Under these circumstances I find it
easier to argue that the back-propagation is consistent (sound). If a specific
checker (other than the `GenericTaintChecker`) models taint, it should stick to
the rules described and emit the right `NoteTag`. That way even downstream
checkers would play nicely with the taint-tracking and benefit from it.
__Interestingness__:
The concerns seem valid about that the set of interesting symbols could be
larger than the actually required (and desired) set. However, I wouldn't
worried about this much unless we have an example on which we can continue
discussing that this is a real concern.
What I can see is that as-of-now we use the interestingness notion for this,
and deviating or introducing something else would introduce complexity. So, I'm
not strictly against having something else, but I would lean towards using
interestingness here as well, unless we have clear-cut examples demonstrating
the need for something more sophisticated.
Finally, I'd like to thank you for investing your time into this subject. We
really should have done it much earlier. Without these similar improvements
like this one, it's just half way done. Thank you.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144269/new/
https://reviews.llvm.org/D144269
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits