Re: Better GCC diagnostics

Manuel López-Ibáñez Fri, 15 Aug 2008 08:20:10 -0700

2008/8/15 Ian Lance Taylor <[EMAIL PROTECTED]>:
> "Manuel López-Ibáñez" <[EMAIL PROTECTED]> writes:
>
>> A) Printing the input expression instead of re-constructing it. As
>>    Joseph explained, this will fix the problems that Aldy mentioned
>>    (PR3544[123] and PR35742) and this requires:
>>
>>   1) For non-preprocessed expr we need at least two locations per expr
>>      (beg/end). This will require changes on the build_* functions to
>>      handle multiple locations.
>
> This is probably obvious, but can you outline why we need two
> locations for each expression?  The tools with which I am familiar
> only print a single caret.  What would use the two locations for?


This has nothing to do with caret diagnostics. This is an orthogonal
issue that would share some infrastructure as Joseph explained. If you
do

warning("called object %qE is not a function", expr);

for

({break;})();

we currently try to re-construct expr and that fails in some cases
(see the PRs referenced).

#'goto_expr' not supported by pp_c_expression#'bug.c: In function 'foo':
bug.c:4: error: called object  is not a function

The alternative is to print whatever we parsed when building expr. To
do that we would need to have begin/end locations for expr, and then
do a location_t->const char * translation and print whatever is
between those two pointers:

bug.c:4: error: called object '({break;})' is not a function


Is it clear now? If so, I will update the wiki to put this example.

>>      b) Re-open the file and fseek. This is not trivial since we need
>>         to do it fast but still do all character conversions that we
>>         did when libcpp opened it the first time. This is
>>         approximately what Clang (LLVM) does and it seems they can do
>>         it very fast by keeping a cache of buffers ever reopened. I
>>         think that thanks to our line-maps implementation, we can do
>>         the seeking quite more efficiently in terms of computation
>>         time.  However, opening files is quite embedded into CPP, so
>>         that would need to be factored out so we can avoid any
>>         unnecessary CPP stuff when reopening but still do it
>>         *properly* and *efficiently*.
>
> If we are going to reopen the file, then why do we need to record the
> locations in the preprocessed token stream?

Because for some diagnostics we want to give the warnings in the
instantiation point not in the macro definition point. Moreover, this
is what we currently do, so if we don't want to change the current
behaviour, we need to track both locations.

Example

/*header.h*/
#pragma GCC system_header
#define BIG  0x1b27da572ef3cd86ULL

/* file.c */
#include "pr7263.h"
__extension__ unsigned long long
bar ()
{
  return BIG;
}

We print a diagnostic at file.c for the expansion of BIG. However,
since we do not have the original location we cannot check that the
token comes from a system header, and we do not suppress the warning.
There are more subtle bugs that arise from not having the original
location available. See PR36478.

BTW, Clang takes into account both locations when printing diagnostics.

> If we keep, for each source line, the file offset in the file of the
> start of that source line, then I think that printing the line from
> the source file would be pretty fast.  That would not be free but it
> would be much cheaper than keeping the entire input file.  Various

Cheaper in terms of memory. It cannot be cheaper in terms of
compilation time than a direct pointer to the already opened buffer
for each line-map.

> optimizations are possible--e.g., keep the file offset for every 16th
> line. Conversely, perhaps we could record the line number and the
> charset conversion state at each 4096th byte in the file; that would
> let us quickly find the page in the file which contains the line.

I am not sure how you plan such approach to interact with
mapped-locations. I think that having an offset for each line-map and
then seeking until you find the correct position would be fine for an
initial implementation. More complex setups could be tested against
this basic implementation. And any optimization done here could be
done with the buffer already opened, so yes, cheaper in terms of
memory maybe but not cheaper in terms of compilation time. If this is
abstracted enough both approaches could perhaps coexist and share the
optimizations: while the front-end is working (where most of the
diagnostics come from) keep the buffers around, when going into
middle-end free them and if we need to give a diagnostic from the
middle-end, reopen and seek. But all this relies on someone factoring
file opening and charset conversion out of CCP first. Once that is
done, we could do whatever strategy or both or something else.

Cheers,

Manuel.

Re: Better GCC diagnostics

Reply via email to