Re: Proposal for deviations in static analyser findings

Bertrand Marquis Tue, 18 Oct 2022 09:12:35 -0700

Hi,

> On 18 Oct 2022, at 16:29, Jan Beulich <jbeul...@suse.com> wrote:
> 
> On 18.10.2022 17:17, Luca Fancellu wrote:
>>> On 13 Oct 2022, at 12:34, Jan Beulich <jbeul...@suse.com> wrote:
>>> On 13.10.2022 12:11, Luca Fancellu wrote:
>>>>> On 13 Oct 2022, at 08:50, Jan Beulich <jbeul...@suse.com> wrote:
>>>>> On 12.10.2022 18:00, Luca Fancellu wrote:
>>>>>> Entries in the database should never be removed, even if they are not 
>>>>>> used
>>>>>> anymore in the code (if a patch is removing or modifying the faulty 
>>>>>> line).
>>>>>> This is to make sure that numbers are not reused which could lead to 
>>>>>> conflicts
>>>>>> with old branches or misleading justifications.
>>>>> 
>>>>> Can we add provisions for shrinking such entries to e.g. just their "id"
>>>>> line? Or is the intention to be able to re-use such an entry if a matching
>>>>> instance appears again later?
>>>> 
>>>> I prefer to don’t shrink it, the name itself is not very long, even using 
>>>> many digits of the incremental
>>>> number, it removes also the dependency on the file name.
>>> 
>>> Name length isn't relevant here, and I have no idea what dependency on a
>>> file name you're thinking of. My question is a scalability one: Over time
>>> the table will grow large. If all entries remain there in full forever,
>>> table size may become unwieldy.
>> 
>> Ok I misunderstood your question, now I understand what you are asking, we 
>> could remove the content
>> of the “analyser” dictionary for sure, because if there is not anymore a 
>> link with the current code.
>> 
>> Regarding removing the “name” and “text”, could it be that at some point we 
>> can introduce in the code
>> a violation that requires the same justification provided by the “orphan” 
>> entry?
>> In that case we could reuse that entry without creating a new one that will 
>> only waste space.
>> What is the opinion on this?
> 
> Well, yes, this is the one case that I, too, was wondering about. It's not
> clear to me whether it wouldn't be better to allocate a fresh ID in such a
> case.


For traceability and release handling I think it is important that:
- we never reuse the same ID in any case
- we keep IDs in the database even if there is no occurrence in xen code as 
this will prevent adding a new ID if an existing one can be reused

In a certification context, when a justification has been validated and agreed 
it will make life a lot easier to not modify it and reuse it in the future if 
it is needed.
From our point of view, having a clear and simple way of handling those will 
make backports a lot easier.


>>>>>> Here a brief explanation of the field inside an object of the "content" 
>>>>>> array:
>>>>>> - id: it is a unique string that is used to refer to the finding, many 
>>>>>> finding
>>>>>> can be tagged with the same id, if the justification holds for any 
>>>>>> applied
>>>>>> case.
>>>>>> It tells the tool to substitute a Xen in-code comment having this 
>>>>>> structure:
>>>>>> /* SAF-0-safe [...] \*/
>>>>>> - analyser: it is an object containing pair of key-value strings, the 
>>>>>> key is
>>>>>> the analyser, so it can be cppcheck, coverity or eclair. The value is the
>>>>>> proprietary id corresponding on the finding, for example when coverity is
>>>>>> used as analyser, the tool will translate the Xen in-code coment in this 
>>>>>> way:
>>>>>> /* SAF-0-safe [...] \*/ -> /* coverity[coverity-id] \*/
>>>>> 
>>>>> In here, where would coverity-id come from? And how does the 
>>>>> transformation
>>>>> here match up with the value of the "coverity": field in the table?
>>>> 
>>>> I can put an example of that, as you pointed out it could be difficult to 
>>>> get where
>>>> this proprietary tool ID comes from.
>>>> 
>>>> The proprietary ID (Coverity in this case) comes from the report it 
>>>> produces:
>>>> 
>>>> […]
>>>> <file path>:<line number>:
>>>> 1. proprietary_ID: […]
>>>> […]
>>>> 
>>>> after we see the finding, we take that ID, we put it in the “analyser” 
>>>> dictionary as:
>>>> 
>>>> […]
>>>> "id":”SAF-2-safe",
>>>> “analyser”: {
>>>>    “coverity”: “proprietary_ID"
>>>> },
>>>> […]
>>>> 
>>>> So in the source code we will have:
>>>> 
>>>> /* SAF-2-safe [optional text] */
>>>> C code affected line;
>>>> 
>>>> And when the analysis will be performed, the tool (coverity for example) 
>>>> will run on this source code:
>>>> 
>>>> /* coverity[proprietary_ID] */
>>>> C code affected line;
>>>> 
>>>> The tool will write a report and will suppress the finding with 
>>>> “proprietary_ID” that comes in the C code
>>>> line after the comment.
>>> 
>>> Let me add some background to my earlier comment:
>>> 
>>> If we wanted to add such IDs to the table, then I guess this would result in
>>> a proliferation of entries. If my observations haven't misguided me,
>>> Coverity might re-use the same ID for multiple similar new issues found in a
>>> single run, but it would not re-use them across runs. Hence irrespective of
>>> their similarity, multiple table entries would be needed just because of the
>>> different Coverity IDs.
>> 
>> Coverity will use every run the same id for the same class of violation, for 
>> example
>> misra_c_2012_rule_8_6_violation for violation of rule 8.6.
> 
> Hmm, I've never seen such. I always saw it use numeric IDs, and we've
> actually been putting these in commits when addressing their findings.

Here I think you are mixing a finding inside the code with the type associated.
We might have several findings of the same type but with different 
justifications.

> 
>>>> After the analysis, the source code will return as the original (with the 
>>>> SAF-* tag).
>>> 
>>> While you mention something similar also as step 3 in the original document
>>> near the top, I'm afraid I don't understand what this "return as the 
>>> original"
>>> means. If you want to run the tool on an altered (comments modified) source
>>> tree, what I'd expect you to do is clone the sources into a throw-away tree,
>>> massage the comments, run the tool, and delete the massaged tree.
>>>>>> if the object doesn't have a key-value, then the corresponding in-code
>>>>>> comment won't be translated.
>>>>> 
>>>>> Iirc at least Coverity ignores certain instances of what it might consider
>>>>> violations (fall-through in switch() statements in particular) in case
>>>>> _any_ comment is present. Therefore may I suggest that such comments be
>>>>> deleted (really: replaced by a blank line, to maintain correct line
>>>>> numbering) if there's no matching key-value pair?
>>>> 
>>>> Yes the line won’t be altered if there is no match. This to ensure the 
>>>> correct line
>>>> numbering is not affected.
>>> 
>>> "won't be altered" is the opposite of what I've been asking to consider:
>>> Observing that comments _regardless_ of their contents may silence findings,
>>> the suggestion is to remove comments (leaving a blank line) when there's no
>>> entry for the targeted tool in the table entry.
>> 
>> Why? The tag comment won’t do anything, it would act as a blank line from 
>> the analyser
>> perspective.
> 
> The _tag_ won't do anything, but as said any _comment_ may have an effect.

I am not sure I follow this one but in any case we can choose to anyway 
substitute the tag with something like /* Not applicable */.

Cheers
Bertrand

> 
> Jan

Re: Proposal for deviations in static analyser findings

Reply via email to