Let's try to focus on what needs to be done looking for specific
features (or fixes) and how we could do it:

A) Printing the input expression instead of re-constructing it. As
   Joseph explained, this will fix the problems that Aldy mentioned
   (PR3544[123] and PR35742) and this requires:

  1) For non-preprocessed expr we need at least two locations per expr
     (beg/end). This will require changes on the build_* functions to
     handle multiple locations.

  1b) For each preprocessed token, we would need to keep two locations:
      one for the preprocessed location and another for the original
      location. As Joseph pointed out, ideally we should be able to
      find a way to track this with a single location_t object so we do
      not need 4 locations per expr.

  2) Changes in the parser to pass down the correct locations to the
     build_* functions.

  3) A location(s) -> source strings interface and machinery. Ideally,
     this should be more or less independent of CPP, so CPP (through
     the diagnostics machinery) calls into this when needed and not
     the other way around. This can be implemented in several ways:

     a) Keeping the CPP buffers in memory and having in line-maps
        pointers directly into the buffers contents. This is easy and
        fast but potentially memory consuming. Care to handle
        charsets, tabs, etc must be taken into account. Factoring out
        anything useful from libcpp would help to implement this.

     b) Re-open the file and fseek. This is not trivial since we need
        to do it fast but still do all character conversions that we
        did when libcpp opened it the first time. This is
        approximately what Clang (LLVM) does and it seems they can do
        it very fast by keeping a cache of buffers ever reopened. I
        think that thanks to our line-maps implementation, we can do
        the seeking quite more efficiently in terms of computation
        time.  However, opening files is quite embedded into CPP, so
        that would need to be factored out so we can avoid any
        unnecessary CPP stuff when reopening but still do it
        *properly* and *efficiently*.

  4) Changes in the diagnostics machinery to extract locations from
     expr and print a string from a
     source file instead of re-constructing things.

  5) Handle locations during folding or avoid aggressive folding in
     the front-ends.

  6) Handle locations during optimisation or update middle-end
     diagnostics to not rely in perfect location information. This
     probably means not using %qE, not column info, and similar
     limitations. Some trade-off must be investigated.


B) Printing accurate column information. This requires:

   *) Preprocessed/original locations in a single location_t. Similar
      as (A.1b) above.

   *) Changes in the parser to pass down the correct
      locations to diagnostics machinery. Similar to (A.2) above.

   B.1) Changes in the testsuite to enable testing column numbers.


C) Consistent diagnostics. This requires:

   C.1) Make CPP use the diagnostics machinery. This will fix part of
        PR7263 and other similar bugs where there is a mismatch
        between the diagnostics machinery and CPP's own diagnostics
        machinery.

   *) Preprocessed/original locations in a single location_t.  This
      will avoid different behaviour when a token comes from a macro
      expansion. Similar as (A.1b) above.


D) Printing Ranges. This requires:

   *) Printing accurate column information. Similar to (B) above.

   *) A location(s) -> source strings interface and machinery. Similar
      to (A.3) above.

   *) Changes in the parser to pass down ranges. Similar to (A.2) above.

   D.1) Changes in the testsuite to enable testing ranges.

   D.2) Changes in the diagnostics machinery to handle ranges.


E) Caret diagnostics. This requires:

   *) Printing accurate column information. Similar to (B) above.

   *) A location(s) -> source strings interface and machinery. Similar
      to (A.3) above.

   E.1) Changes in the diagnostics machinery to print the source line
        and a caret.

I have copied this in the wiki so anyone can update it or add
comments: http://gcc.gnu.org/wiki/Better_Diagnostics

I have some patches to make the diagnostic functions take explicit
locations and I hope to send them soon. Apart from those, I personally
don't have any specific plans to address any of the points above in
the near future because of lack of free time and I still have a long
queue of some trivial patches that I would like to get rid of before
we enter in regression-only mode.

Cheers,

Manuel.

Reply via email to