Let's try to focus on what needs to be done looking for specific features (or fixes) and how we could do it:
A) Printing the input expression instead of re-constructing it. As Joseph explained, this will fix the problems that Aldy mentioned (PR3544[123] and PR35742) and this requires: 1) For non-preprocessed expr we need at least two locations per expr (beg/end). This will require changes on the build_* functions to handle multiple locations. 1b) For each preprocessed token, we would need to keep two locations: one for the preprocessed location and another for the original location. As Joseph pointed out, ideally we should be able to find a way to track this with a single location_t object so we do not need 4 locations per expr. 2) Changes in the parser to pass down the correct locations to the build_* functions. 3) A location(s) -> source strings interface and machinery. Ideally, this should be more or less independent of CPP, so CPP (through the diagnostics machinery) calls into this when needed and not the other way around. This can be implemented in several ways: a) Keeping the CPP buffers in memory and having in line-maps pointers directly into the buffers contents. This is easy and fast but potentially memory consuming. Care to handle charsets, tabs, etc must be taken into account. Factoring out anything useful from libcpp would help to implement this. b) Re-open the file and fseek. This is not trivial since we need to do it fast but still do all character conversions that we did when libcpp opened it the first time. This is approximately what Clang (LLVM) does and it seems they can do it very fast by keeping a cache of buffers ever reopened. I think that thanks to our line-maps implementation, we can do the seeking quite more efficiently in terms of computation time. However, opening files is quite embedded into CPP, so that would need to be factored out so we can avoid any unnecessary CPP stuff when reopening but still do it *properly* and *efficiently*. 4) Changes in the diagnostics machinery to extract locations from expr and print a string from a source file instead of re-constructing things. 5) Handle locations during folding or avoid aggressive folding in the front-ends. 6) Handle locations during optimisation or update middle-end diagnostics to not rely in perfect location information. This probably means not using %qE, not column info, and similar limitations. Some trade-off must be investigated. B) Printing accurate column information. This requires: *) Preprocessed/original locations in a single location_t. Similar as (A.1b) above. *) Changes in the parser to pass down the correct locations to diagnostics machinery. Similar to (A.2) above. B.1) Changes in the testsuite to enable testing column numbers. C) Consistent diagnostics. This requires: C.1) Make CPP use the diagnostics machinery. This will fix part of PR7263 and other similar bugs where there is a mismatch between the diagnostics machinery and CPP's own diagnostics machinery. *) Preprocessed/original locations in a single location_t. This will avoid different behaviour when a token comes from a macro expansion. Similar as (A.1b) above. D) Printing Ranges. This requires: *) Printing accurate column information. Similar to (B) above. *) A location(s) -> source strings interface and machinery. Similar to (A.3) above. *) Changes in the parser to pass down ranges. Similar to (A.2) above. D.1) Changes in the testsuite to enable testing ranges. D.2) Changes in the diagnostics machinery to handle ranges. E) Caret diagnostics. This requires: *) Printing accurate column information. Similar to (B) above. *) A location(s) -> source strings interface and machinery. Similar to (A.3) above. E.1) Changes in the diagnostics machinery to print the source line and a caret. I have copied this in the wiki so anyone can update it or add comments: http://gcc.gnu.org/wiki/Better_Diagnostics I have some patches to make the diagnostic functions take explicit locations and I hope to send them soon. Apart from those, I personally don't have any specific plans to address any of the points above in the near future because of lack of free time and I still have a long queue of some trivial patches that I would like to get rid of before we enter in regression-only mode. Cheers, Manuel.