> On Wed, Jun 10, 2020 at 12:11:00PM -0400, David Malcolm wrote: > > Thanks for the patch; sorry about the delay in reviewing it. > > > > Some high-level review points > > > > - I like the patch overall > > > > - This will deserve an item in the release notes > > > > - I don't like adding "global_tabstop" (I don't like global > > variables). Is there nowhere else we can handle this? I believe > > there's a cluster of functions in the callgraph that make use of > > it; can we simply pass around the tabstop value instead? "tabstop" > > seems to have several meanings. If I'm reading the patch correctly > > * "tabstop > 0" means to expand tabs so that column numbers are a > > multiple of tabstop > > * "tabstop == 0" means "don't expand tabs" > > * "tabstop < 0" in some places means: use the global_tabstop value > > Is it possible to eliminate global_tabstop value? Or is there some > > deep reason I'm missing? > > > > I'll do a more thorough review once that's addressed/resolved (since > > eliminating global_tabstop might touch a few places). > > > > Thanks for the feedback! The attached updated patch addresses these > concerns. Regarding tabstop, I have removed the new static variable > global_tabstop in charset.c. FWIW, the usage of "tabstop" arguments in the > various new APIs did previously work a bit more consistently than you > described. In all cases "tabstop <= 0" meant to use the default value, > otherwise it specified the tabstop to use (with tabstop=1 naturally > restoring the old behavior of changing tabs to a single space). In order > for libcpp to provide this feature (callers can pass tabstop <= 0 to get a > default, and the default can in turn by configured when processing the > -ftabstop option), it does need to remember the default, and this has to > be a file-level static variable because the routines need to work > independent of any cpp_reader instance. (Some frontends don't use > libcpp to read their input, for instance.) Anyway, I see the point that > this file-level static, being accessible with cpp_set_tabstop() and > cpp_get_tabstop(), is effectively just a global variable, so I have > removed this feature, which just means that all callers need to pass the > tabstop they want to use. I am now rather using the diagnostic_context > object to remember the value passed to -ftabstop. The only place this > involves global variables is now in c-family/c-indentation.c, where if I > understood correctly, the only diagnostic_context available is global_dc, > so I am getting the tabstop value from there. Please let me know if > there's a better way to handle that? Prior to my patch, the tabstop was > obtained from a different global variable (extern cpp_options *cpp_opts), > so at least conservation of total globals is maintained. :)
Thanks. That sounds like a good approach. > Compared to the previous version, this one is a bit longer, since 25 or > so call sites had to be modified to know the value of -ftabstop. Most of > the churn is in diagnostic-show-locus.c, because there are a fair number of > static helper functions and helper classes there, which just needed to > receive the diagnostic_context object from their callers. I could > have made this simpler by letting the tabstop argument default to > something like 8 in all functions that require it... this would remove the > need to pass it in all the selftests that are indifferent to it. I figured > it would be better to force this argument to be passed, though, or else in > the future it may be easy to forget to pass it where it is needed. Thanks. > > Thanks for adding docs; some nits on them: > > > > > --- a/gcc/doc/invoke.texi > > > +++ b/gcc/doc/invoke.texi > > > > [...snip...] > > > > > +@item -fdiagnostics-column-unit=@var{UNIT} > > > +@opindex fdiagnostics-column-unit > > > +Select the units for the column number. This affects traditional diagnostics > > > +(in the absence of @option{-fno-show-column}), as well as JSON format > > > +diagnostics if requested. > > > + > > > +The default @var{UNIT}, @samp{display}, considers the number of display columns > > > +occupied by each character. This may be larger than the number of bytes > > > +occupied, in the case of tab characters, or it may be smaller, in the case of > > > +multibyte characters. For example, the UTF-8 character ``@U{03C 0}'' occupies > > > +two bytes and one display column, while the character ``@U{1F642 }'' occupies > > > +four bytes and two display columns. > > > > This is imprecise. A unicode code point occupies some number of display columns, > > and its *UTF-8 encoding* occupies some number of bytes. > > > > [and my inner pedant is now thinking: what about combining diacritics? > > But I don't think we can ever issue a diagnostic on a diacritic; I > > *think* we only ever care about the per-glyph level] > > > > > +Setting @var{UNIT} to @samp{byte} changes the column number to the > > raw byte > > > +count in all cases, as was traditionally output by GCC prior to version 11.1.0. > > > + > > > +@item -fdiagnostics-column-origin=@var{ORIGIN} > > > +@opindex fdiagnostics-column-origin > > > +Select the origin for column numbers, i.e. the column number assigned to the > > > +first column. The default value of 1 corresponds to traditional GCC > > > +behavior and to the GNU style guide. Some utilities may perform better with an > > > +origin of 0; any non-negative value may be specified. > > > + > > > @item -fdiagnostics-format=@var{FORMAT} > > > @opindex fdiagnostics-format > > > Select a different format for printing diagnostics. > > > > [...snip...] > > > > > +A diagnostic can contain zero or more locations. Each location has an > > > +optional @code{label} string and up to three positions within it: a > > > +@code{caret} position and optional @code{start} and @code{finish} positions. > > > +A position is described by a @code{file} name, a @code{line} number, and > > > +three numbers indicating a column position: @code{display- column} counts > > > +display columns, accounting for tabs and multibyte characters; > > > +@code{byte-column} counts raw bytes; and @code{column} is equal to one of > > > +the previous two, as dictated by the @option{-fdiagnostics- column-unit} > > > +option. > > > > Might be clearer to use an unordered list here for the three kinds of column. > > > > > All three columns are relative to the origin specified by > > > +@option{-fdiagnostics-column-origin}, which is typically equal to 1 but may > > > +be set, for instance, to 0 for compatibility with other utilities that > > > +number columns from 0. The column origin is recorded in the JSON output in > > > +the @code{column-origin} tag. In the remaining examples below, the extra > > > +column number outputs have been omitted for brevity. > > > > [...snip...] > > > > I improved the docs along these lines. > > > Thanks again for the patch; hope this is constructive > > Dave > > > > Thanks for your time! BTW, I did bootstrap + regtest this version as well on > x86-64 Linux, it looks good, new tests pass and others are the same: > > FAIL 97 97 > PASS 476837 477297 > UNRESOLVED 7 7 > UNSUPPORTED 11726 11726 > UNTESTED 195 195 > XFAIL 1807 1807 > XPASS 37 37 > > -Lewis > Thanks for the updated patch. This looks (almost) ready; I have a few nits inline below.... > From 7729ce3334b6768a25967a6dd4a0a5a2ed0923cc Mon Sep 17 00:00:00 2001 > From: Lewis Hyatt <lhy...@gmail.com> > Date: Wed, 10 Jun 2020 22:04:07 -0400 > Subject: [PATCH] diagnostics: Support conversion of tabs to spaces [PR49973] > [PR86904] > > Supports conversion of tabs to spaces when outputting diagnostics. Also > adds -fdiagnostics-column-unit and -fdiagnostics-column-origin options to > control how the column number is output, thereby resolving the two PRs. > > gcc/c-family/ChangeLog: > > PR other/86904 > * c-indentation.c (should_warn_for_misleading_indentation): Get > global tabstop from the new source. > * c-opts.c (c_common_handle_option): Remove handling of -ftabstop, > which > is now a common option. > * c.opt: Likewise. > > gcc/ChangeLog: > > PR preprocessor/49973 > PR other/86904 > * common.opt: Handle -ftabstop here instead of in c-family > options. Add -fdiagnostics-column-unit= and > -fdiagnostics-column-origin= options. > * opts.c (common_handle_option): Handle the new options. > * diagnostic-format-json.cc (json_from_expanded_location): Add > diagnostic_context argument. Use it to convert column numbers as per > the new options. > (json_from_location_range): Likewise. > (json_from_fixit_hint): Likewise. > (json_end_diagnostic): Pass the new context argument to helper > functions above. Add "column-origin" field to the output. > (test_unknown_location): Add the new context argument to calls to > helper functions. > (test_bad_endpoints): Likewise. > * diagnostic-show-locus.c > (exploc_with_display_col::exploc_with_display_col): Support > tabstop parameter. > (layout_point::layout_point): Make use of class > exploc_with_display_col. > (layout_range::layout_range): Likewise. > (struct line_bounds): Clarify that the units are now always > display columns. Rename members accordingly. Add constructor. > (layout::print_source_line): Add support for tab expansion. > (make_range): Adapt to class layout_range changes. > (layout::maybe_add_location_range): Likewise. > (layout::layout): Adapt to class exploc_with_display_col changes. > (layout::calculate_x_offset_display): Support tabstop parameter. > (layout::print_annotation_line): Adapt to struct line_bounds changes. > (layout::print_line): Likewise. > (line_label::line_label): Add diagnostic_context argument. > (get_affected_range): Likewise. > (get_printed_columns): Likewise. > (layout::print_any_labels): Adapt to struct line_label changes. > (class correction): Add m_tabstop member. > (correction::correction): Add tabstop argument. > (correction::compute_display_cols): Use m_tabstop. > (class line_corrections): Add m_context member. > (line_corrections::line_corrections): Add diagnostic_context argument. > (line_corrections::add_hint): Use m_context to handle tabstops. > (layout::print_trailing_fixits): Adapt to class line_corrections > changes. > (test_layout_x_offset_display_utf8): Support tabstop parameter. > (test_layout_x_offset_display_tab): New selftest. > (test_one_liner_colorized_utf8): Likewise. > (test_tab_expansion): Likewise. > (test_diagnostic_show_locus_one_liner_utf8): Call the new tests. > (diagnostic_show_locus_c_tests): Likewise. > (test_overlapped_fixit_printing): Adapt to helper class and > function changes. > (test_overlapped_fixit_printing_utf8): Likewise. > (test_overlapped_fixit_printing_2): Likewise. > * diagnostic.h (enum diagnostics_column_unit): New enum. > (struct diagnostic_context): Add members for the new options. > (diagnostic_converted_column): Declare. > (json_from_expanded_location): Add new context argument. > * diagnostic.c (diagnostic_initialize): Initialize new members. > (diagnostic_converted_column): New function. > (maybe_line_and_column): Be willing to output a column of 0. > (diagnostic_get_location_text): Convert column number as per the new > options. > (diagnostic_report_current_module): Likewise. > (assert_location_text): Add origin and column_unit arguments for > testing the new functionality. > (test_diagnostic_get_location_text): Test the new functionality. > * doc/invoke.texi: Document the new options and behavior. > * input.h (location_compute_display_column): Add tabstop argument. > * input.c (location_compute_display_column): Likewise. > (test_cpp_utf8): Add selftests for tab expansion. > * tree-diagnostic-path.cc (default_tree_make_json_for_path): Pass the > new context argument to json_from_expanded_location(). > > libcpp/ChangeLog: > > PR preprocessor/49973 > PR other/86904 > * include/cpplib.h (struct cpp_options): Removed support for > -ftabstop, > which is now handled by diagnostic_context. > (class cpp_display_width_computation): New class. > (cpp_byte_column_to_display_column): Add optional tabstop argument. > (cpp_display_width): Likewise. > (cpp_display_column_to_byte_column): Likewise. > * charset.c > (cpp_display_width_computation::cpp_display_width_computation): New > function. > (cpp_display_width_computation::advance_display_cols): Likewise. > (compute_next_display_width): Removed and implemented this > functionality in a new function... > (cpp_display_width_computation::process_next_codepoint): ...here. > (cpp_byte_column_to_display_column): Added tabstop argument. > Reimplemented in terms of class cpp_display_width_computation. > (cpp_display_column_to_byte_column): Likewise. > * init.c (cpp_create_reader): Remove handling of -ftabstop, which is > now > handled by diagnostic_context. > > gcc/testsuite/ChangeLog: > > PR preprocessor/49973 > PR other/86904 > * c-c++-common/Wmisleading-indentation-3.c: Adjust expected output > for new defaults. > * c-c++-common/Wmisleading-indentation.c: Likewise. > * c-c++-common/diagnostic-format-json-1.c: Likewise. > * c-c++-common/diagnostic-format-json-2.c: Likewise. > * c-c++-common/diagnostic-format-json-3.c: Likewise. > * c-c++-common/diagnostic-format-json-4.c: Likewise. > * c-c++-common/diagnostic-format-json-5.c: Likewise. > * c-c++-common/missing-close-symbol.c: Likewise. > * g++.dg/diagnostic/bad-binary-ops.C: Likewise. > * g++.dg/parse/error4.C: Likewise. > * g++.old-deja/g++.brendan/crash11.C: Likewise. > * g++.old-deja/g++.pt/overload2.C: Likewise. > * g++.old-deja/g++.robertl/eb109.C: Likewise. > * gcc.dg/analyzer/malloc-paths-9.c: Likewise. > * gcc.dg/bad-binary-ops.c: Likewise. > * gcc.dg/format/branch-1.c: Likewise. > * gcc.dg/format/pr79210.c: Likewise. > * gcc.dg/plugin/diagnostic-test-expressions-1.c: Likewise. > * gcc.dg/plugin/diagnostic-test-string-literals-1.c: Likewise. > * gcc.dg/redecl-4.c: Likewise. > * gfortran.dg/diagnostic-format-json-1.F90: Likewise. > * gfortran.dg/diagnostic-format-json-2.F90: Likewise. > * gfortran.dg/diagnostic-format-json-3.F90: Likewise. > * go.dg/arrayclear.go: Add a comment explaining why adding a > comment was necessary to work around a dejagnu bug. > * c-c++-common/diagnostic-units-1.c: New test. > * c-c++-common/diagnostic-units-2.c: New test. > * c-c++-common/diagnostic-units-3.c: New test. > * c-c++-common/diagnostic-units-4.c: New test. > * c-c++-common/diagnostic-units-5.c: New test. > * c-c++-common/diagnostic-units-6.c: New test. > * c-c++-common/diagnostic-units-7.c: New test. > * c-c++-common/diagnostic-units-8.c: New test. [...snip...] > diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h > index 307dbcfb34a..75706c5f4d8 100644 > --- a/gcc/diagnostic.h > +++ b/gcc/diagnostic.h > @@ -24,6 +24,20 @@ along with GCC; see the file COPYING3. If not see > #include "pretty-print.h" > #include "diagnostic-core.h" > > +/* An enum for controlling what units to use for the column number > + when diagnostics are output, used by the -fdiagnostics-column-unit option. > + Tabs will be expanded or not according to the value of -ftabstop. The > origin > + (default 1) is controlled by -fdiagnostics-column-origin. */ > + "New" and "historical" can get out of date, so how about: > +enum diagnostics_column_unit > +{ > + /* The new default: display columns. */ /* The default from GCC 11 onwards: display column. */ > + DIAGNOSTICS_COLUMN_UNIT_DISPLAY, > + > + /* The historical behavior: simple bytes. */ /* The behavior in GCC 10 and earlier: simple bytes. */ > + DIAGNOSTICS_COLUMN_UNIT_BYTE > +}; ? [...snip...] > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 06a04e3d7dd..f463275bc8b 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi [...snip...] > @@ -4729,6 +4731,31 @@ Do not print column numbers in diagnostics. This may > be necessary if > diagnostics are being scanned by a program that does not understand the > column numbers, such as @command{dejagnu}. > > +@item -fdiagnostics-column-unit=@var{UNIT} > +@opindex fdiagnostics-column-unit > +Select the units for the column number. This affects traditional diagnostics > +(in the absence of @option{-fno-show-column}), as well as JSON format > +diagnostics if requested. > + > +The default @var{UNIT}, @samp{display}, considers the number of display > +columns occupied by each character. This may be larger than the number > +of bytes required to encode the character, in the case of tab > +characters, or it may be smaller, in the case of multibyte characters. > +For example, the character ``@U{03C0}'' occupies one display column, > +and its UTF-8 encoding requires two bytes; the character ``@U{1F642}'' > +occupies two display columns, and its UTF-8 encoding requires four > +bytes. Thanks for reworking this. I'm wary of those @U commands - does the generated HTML from "make html" work? (similarly for the man page). Would it be safer to express them in the following form? For example, the character ``GREEK SMALL LETTER PI (U+03C0)'' occupies one display column, and its UTF-8 encoding requires two bytes; the character ``SLIGHTLY SMILING FACE' (U+1F642)'' occupies two display columns, and its UTF-8 encoding requires four bytes. or somesuch? > +Setting @var{UNIT} to @samp{byte} changes the column number to the raw byte > +count in all cases, as was traditionally output by GCC prior to version > 11.1.0. [...snip...] > @item -fdiagnostics-format=@var{FORMAT} > @opindex fdiagnostics-format > Select a different format for printing diagnostics. > @@ -4764,11 +4791,15 @@ might be printed in JSON form (after formatting) like > this: > "locations": [ > @{ > "caret": @{ > + "display-column": 3, > + "byte-column": 3, > "column": 3, > "file": "misleading-indentation.c", > "line": 15 > @}, > "finish": @{ > + "display-column": 4, > + "byte-column": 4, > "column": 4, > "file": "misleading-indentation.c", > "line": 15 Nit: the new fields don't line up with the old ones. > @@ -4784,6 +4815,8 @@ might be printed in JSON form (after formatting) like > this: > "locations": [ > @{ > "caret": @{ > + "display-column": 5, > + "byte-column": 5, > "column": 5, > "file": "misleading-indentation.c", > "line": 17 Likewise. [...snip...] > diff --git a/gcc/opts.c b/gcc/opts.c > index 340d99434b3..525f44d079f 100644 > --- a/gcc/opts.c > +++ b/gcc/opts.c > @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see > #include "opt-suggestions.h" > #include "diagnostic-color.h" > #include "selftest.h" > +#include "cpplib.h" Is this new #include still needed? [...snip...] OK for trunk with those nits fixed. Dave