On Fri, 2020-05-08 at 15:35 -0400, Lewis Hyatt wrote: > On Fri, Jan 31, 2020 at 03:31:59PM -0500, David Malcolm wrote: > > On Fri, 2020-01-31 at 14:31 -0500, Lewis Hyatt wrote: > > > Hello- > > > > > > Here is the second patch that I mentioned when I submitted the > > > other > > > related > > > patch (which is awaiting review): > > > https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01626.html. > > > > Sorry about that; I'm v. busy with analyzer bugs right now. > > > > > This second patch > > > is based on top of the first one and it closes out PR49973 and > > > PR86904 by > > > adding the new option -fdiagnostics-column-unit=[display|byte]. > > > This > > > allows > > > to specify whether columns are output as simple byte counts (the > > > current > > > behavior), or as display columns including handling multibyte > > > characters and > > > tabs. The patch makes display columns the new default. > > > Additionally, > > > a > > > second new option -fdiagnostics-column-origin is added, which > > > allows > > > to make > > > the column 0-based (or N-based for any N) instead of 1-based. The > > > default > > > remains at 1-based as it is now. > > > > > > A number of testcases were explicitly testing for the old > > > behavior, > > > so I > > > have updated them to test for the new behavior instead, since the > > > column > > > number adjusted for tabs is more natural to test for, and matches > > > what > > > editors typically show (give or take 1 for the origin > > > convention). > > > > > > One other testcase (go.dg/arrayclear.go) was a bit of an oddity. > > > It > > > failed > > > after this patch, although it doesn't test for any column > > > numbers. > > > The > > > answer turned out to be, this test checks for identical error > > > text on > > > two > > > different lines. When the column units are changed to display > > > columns, then > > > the column of the second error happens to match the line of the > > > first > > > one. dejagnu then misinterprets the second error as if it matched > > > the > > > location of the first one (it doesn't distinguish whether it > > > checks > > > for the > > > line number or the column number in the output). I added a > > > comment to > > > the > > > test explaining the situation; since adding the comment has the > > > side > > > effect > > > of making the first line number no longer match the second column > > > number, it > > > also makes the test pass again. > > > > > > It wasn't quite clear to me whether this change was appropriate > > > for > > > GCC 10 > > > or not at this point. We discussed it a couple months ago here: > > > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02171.html. Either > > > way, > > > I hope > > > it isn't a problem that I submitted the patch for review now, > > > whether > > > it > > > will end up in 10 or 11. Please let me know what's normally > > > expected? > > > Thanks! > > > > Thanks Lewis. > > > > This patch looks very promising, but should wait until gcc 11; > > we're > > trying to stabilize gcc 10 right now (I'm knee-deep in analyzer > > bug- > > fixing, so I don't want to add any more diagnostics changes). > > > > Hi Dave- > > Well GCC 10 was released for a whole day so I thought I would bug you > with this > patch again now :). To summarize, I previously sent this in two > separate parts. > > Part 1: > https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg01626.html > Part 2: > https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg02108.html > > Part 1 added the support for converting tabs to spaces when > outputting > diagnostics. Part 2 added the new options -fdiagnostics-column-unit > and > -fdiagnostics-column-origin to control whether the column number is > printed > in display or byte units. Together they resolve both PR49973 and > PR86904. > > You provided me with feedback on part 2, which is quoted below with > some > notes interspersed. The new version of the patch incorporates all of > your > suggestions. Part 1 has not changed other than some trivial rebasing > conflicts. The two patches touch nearly disjoint sets of files and > are > logically linked together, so I thought it would be simpler if I just > sent > one combined patch now. If you prefer them to be separated as before, > please > let me know and I can send them that way as well. > > Bootstrap and reg tests were done on x86-64 Linux for all > languages. Tests > look good: > > type, before, after > FAIL 96 96 > PASS 474637 475097 > UNSUPPORTED 11607 11607 > UNTESTED 195 195 > XFAIL 1816 1816 > XPASS 36 362
Thanks for the patch; sorry about the delay in reviewing it. Some high-level review points - I like the patch overall - This will deserve an item in the release notes - I don't like adding "global_tabstop" (I don't like global variables). Is there nowhere else we can handle this? I believe there's a cluster of functions in the callgraph that make use of it; can we simply pass around the tabstop value instead? "tabstop" seems to have several meanings. If I'm reading the patch correctly * "tabstop > 0" means to expand tabs so that column numbers are a multiple of tabstop * "tabstop == 0" means "don't expand tabs" * "tabstop < 0" in some places means: use the global_tabstop value Is it possible to eliminate global_tabstop value? Or is there some deep reason I'm missing? I'll do a more thorough review once that's addressed/resolved (since eliminating global_tabstop might touch a few places). Thanks for adding docs; some nits on them: > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi [...snip...] > +@item -fdiagnostics-column-unit=@var{UNIT} > +@opindex fdiagnostics-column-unit > +Select the units for the column number. This affects traditional diagnostics > +(in the absence of @option{-fno-show-column}), as well as JSON format > +diagnostics if requested. > + > +The default @var{UNIT}, @samp{display}, considers the number of display > columns > +occupied by each character. This may be larger than the number of bytes > +occupied, in the case of tab characters, or it may be smaller, in the case of > +multibyte characters. For example, the UTF-8 character ``@U{03C0}'' occupies > +two bytes and one display column, while the character ``@U{1F642}'' occupies > +four bytes and two display columns. This is imprecise. A unicode code point occupies some number of display columns, and its *UTF-8 encoding* occupies some number of bytes. [and my inner pedant is now thinking: what about combining diacritics? But I don't think we can ever issue a diagnostic on a diacritic; I *think* we only ever care about the per-glyph level] > +Setting @var{UNIT} to @samp{byte} changes the column number to the raw byte > +count in all cases, as was traditionally output by GCC prior to version > 11.1.0. > + > +@item -fdiagnostics-column-origin=@var{ORIGIN} > +@opindex fdiagnostics-column-origin > +Select the origin for column numbers, i.e. the column number assigned to the > +first column. The default value of 1 corresponds to traditional GCC > +behavior and to the GNU style guide. Some utilities may perform better with > an > +origin of 0; any non-negative value may be specified. > + > @item -fdiagnostics-format=@var{FORMAT} > @opindex fdiagnostics-format > Select a different format for printing diagnostics. [...snip...] > +A diagnostic can contain zero or more locations. Each location has an > +optional @code{label} string and up to three positions within it: a > +@code{caret} position and optional @code{start} and @code{finish} positions. > +A position is described by a @code{file} name, a @code{line} number, and > +three numbers indicating a column position: @code{display-column} counts > +display columns, accounting for tabs and multibyte characters; > +@code{byte-column} counts raw bytes; and @code{column} is equal to one of > +the previous two, as dictated by the @option{-fdiagnostics-column-unit} > +option. Might be clearer to use an unordered list here for the three kinds of column. > All three columns are relative to the origin specified by > +@option{-fdiagnostics-column-origin}, which is typically equal to 1 but may > +be set, for instance, to 0 for compatibility with other utilities that > +number columns from 0. The column origin is recorded in the JSON output in > +the @code{column-origin} tag. In the remaining examples below, the extra > +column number outputs have been omitted for brevity. [...snip...] Thanks again for the patch; hope this is constructive Dave