Re: [PATCH 11/49] Add diagnostic_metadata and CWE support

David Malcolm Fri, 13 Dec 2019 09:32:35 -0800

On Fri, 2019-12-13 at 09:43 -0700, Martin Sebor wrote:
> On 12/12/19 7:59 PM, David Malcolm wrote:
> > On Wed, 2019-12-04 at 10:36 -0700, Martin Sebor wrote:
> > > On 11/15/19 6:22 PM, David Malcolm wrote:
> > > > This patch adds support for associating a diagnostic with an
> > > > optional
> > > > diagnostic_metadata object, so that plugins can add extra data
> > > > to
> > > > their
> > > > diagnostics (e.g. mapping a diagnostic to a taxonomy or coding
> > > > standard
> > > > such as from CERT or MISRA).
> > > > 
> > > > Currently this only supports associating a CWE identifier with
> > > > a
> > > > diagnostic (which is what I'm using for the analyzer warnings
> > > > later
> > > > in
> > > > the patch kit), but adding a diagnostic_metadata class allows
> > > > for
> > > > future
> > > > growth in this area without an explosion of further
> > > > "warning_at"
> > > > overloads
> > > > for all of the different kinds of custom data that a plugin
> > > > might
> > > > want to
> > > > add.
> > > 
> > > We discussed this in the past so I'll be repeating some of my
> > > comments from that thread.  I like the feature at a high level,
> > > but I'm not sure I see how to make it work in a robust way.
> > > 
> > > There is no one-to-one mapping between GCC warnings and any of
> > > these classifications.  A single GCC warning might map to several
> > > CERT or MISRA guidelines, and a single guideline might map to two
> > > or more different GCC warnings.  This is an M to N mapping times
> > > the number of coding standards/classifications.  I haven't looked
> > > at the patch kit in enough detail to understand how it tries to
> > > handle it.  Can you explain?
> > 
> > The patch kit gives the ability to (optionally) specify a single
> > CWE
> > identifier when emitting a diagnostic.
> > 
> > The patch kit uses this in various places for its new warnings for
> > the
> > cases where there is a well-defined CWE weakness identifier (and
> > doesn't otherwise).
> > 
> > It doesn't attempt to classify any existing warnings.
> > 
> > It doesn't attempt to support other classifications in this patch,
> > but
> > it does introduce a diagnostic_metadata object that could gain this
> > functionality at some later time, if we come up with a scheme we're
> > happy with (a slight violation of YAGNI, but it's such a pain
> > touching
> > all the diagnostic fns that I prefer to do it once).
> 
> I wasn't necessarily asking about any classifications in particular,
> or even about pre-existing warnings.  I think it would be nice to be
> able to eventually make it work for them but I obviously realize that
> will take quite a bit retrofitting.
> 
> What I'd like to understand how the warning X classification-id
> mapping for the new analyzer warnings is supported in the design
> of the (new) diagnostic infrastructure (or how you envision it
> will be when it's more complete).
> 
> You said the analyzer currently only supports associating a CWE
> identifier with a diagnostic.  By "diagnostic" do you mean
> a particular instance of a given warning or all its instances?
> 
> Picking an uninitialized use as an example, there are three CWE's
> that discuss it:
> 
>    CWE-457: Use of Uninitialized Variable
>    CWE-824: Access of Uninitialized Pointer
>    CWE-908: Use of Uninitialized Resource
> 
> but only one -Wanalyzer-use-of-uninitialized-value.  I'd like to
> see how the mapping works (or will work) between these three CWE's
> and instances of this analyzer option/warning.
> 
> I see on Goldbolt that for either the use of an uninitialized
> pointer (CWE-457) or for its dereference (CWE-824) the analyzer
> issues the same warning with the same CWE ID:
> 
>    use of uninitialized value 'p' [CWE-457] 
> [-Wanalyzer-use-of-uninitialized-value]
> 
> It doesn't differentiate between the two CWE's.


That warning is implemented in analyzer/region-model.cc (see:
  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01554.html )

The code in question is in poisoned_value_diagnostic::emit

      case POISON_KIND_UNINIT:
        {
          diagnostic_metadata m;
          m.add_cwe (457); /* "CWE-457: Use of Uninitialized Variable".  */
          return warning_at (rich_loc, m,
                             OPT_Wanalyzer_use_of_uninitialized_value,
                             "use of uninitialized value %qE",
                             m_expr);
        }
        break;

and currently that's code is only called when detecting a dereference
of an uninitialized pointer.

So, yes, this one probably should be CWE-824, but that's merely a bug
in my code, not a problem with the "infrastructure".

It looks like I'll need an extra attribute for that class to
distinguish the type of access (read from local seems to be CWE-457,
deref pointer seems to be CWE-824; CWE-908 would be emitted from a
different place, if I'm understanding the definition correctly).


> If this limitation is baked into the design of the infrastructure
> then that's an obvious flaw that needs to be fixed.  There is no
> point in issuing warnings with the wrong CWE's.  Anyone who cares
> about CWE's would view those as bugs, and those who don't care
> don't need to see them.
> 
> > > My other concern here is that these guidelines evolve.  New ones
> > > are being added, existing ones moved or broken up, etc.  Does
> > > the infrastructure cope with this evolution (e.g., by reading
> > > the mapping from a file)?
> > 
> > Let's cross that bridge when we come to it.  If it becomes a
> > problem,
> > then we can come up with a fix (e.g. a command-line option to
> > specify
> > the CWE version).
> I agree that dealing with evolving classifications is a secondary
> goal that can probably be fairly easily met.  I wanted to know if
> there already was a solution for it now.
> 
> But referencing the right IDs in the diagnostics seems like a basic
> design requirement.  

If you find that I'm using the wrong IDs, then tell me, and I'll fix
them.

> Unless the above example is due to the mapping
> not being fully populated yet it seems clear that issuing a warning
> with one CWE ID for a bug described under another CWE ID is
> a problem.

I'm not sure what you mean by a "mapping" being "populated".  It's
effectively just an optional int value that we can supply at diagnostic
calls.  It's different from the option ID, so yes, we can support
different CWEs for the same option ID if need be.  There isn't support
for multiple CWEs at the same instance of a warning (that could be
added if need be, but I wasn't planning to).

Dave

Re: [PATCH 11/49] Add diagnostic_metadata and CWE support

Reply via email to