https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49973
--- Comment #20 from David Malcolm <dmalcolm at gcc dot gnu.org> --- I've committed r279137 on Lewis's behalf, which fixes the issues identified in patch #13. As noted in review of the patch, we didn't attempt to change the behavior of diagnostic_get_location_text with this change. Quoting myself from: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02171.html > This is the column number as reported in the diagnostic i.e the COL_NUM > when printing e.g. > warning: FILENAME:LINE_NUM:COL_NUM: some message > > It seems to me that PR 49973 and this patch cover two separate things: > (a) bytes vs display columns in diagnostic-show-locus.c > (b) the "COL_NUM" mentioned above. > > I'd prefer to omit (b) from the patch, and have the focus of the patch > be (a), to tackle (b) in a separate patch. > > [There's also the meaning of column numbers in the JSON output, and in > the output of -fdiagnostics-parseable-fixits (which is intended to mimic > clang's output format)] > > It's unclear to me what the reported COL_NUM should be. > There are various possibilities: > > Units: > (A) [status quo] report a count of bytes within the line > (B) report a count of unicode characters > (C) report a count of unicode graphemes > (D) report based on the wcwidth of the characters > etc > > Origin/baseline: > (A) [status quo] use 1 for the leftmost column > (B) use 0 for the leftmost column > > Tab-handling: > (A) [status quo] don't give any kind of special status to tab characters > (B) implement tab stops, somehow. For example, get_visual_column in > c-family/c-indentation implements tab stops based on bytes. > > (so at least 4*2*2 = 16 possible meanings, ugh) > > See also e.g.: > https://github.com/oasis-tcs/sarif-spec/issues/178 > > The GNU Coding Standards say > > Line numbers should start from 1 at the beginning of the file, and > column numbers should start from 1 at the beginning of the line. > (Both of these conventions are chosen for compatibility.) Calculate > column numbers assuming that space and all ASCII printing characters > have equal width, and assuming tab stops every 8 columns. For > non-ASCII characters, Unicode character widths should be used when in > a UTF-8 locale; GNU libc and GNU gnulib provide suitable wcwidth > functions. > (https://www.gnu.org/prep/standards/standards.html#Errors) > > I think if we do change the meaning of the "COL_NUM" output, we should > probably add an option for it, to help with the transition (so that > people can easily revert to the old behavior). > > Perhaps something like: > > -fdiagnostics-column-unit=[bytes|gnu] > > bytes: [status-quo]; 1-based count of bytes, not respecting tab stops > gnu: as per GNU Coding Standards above > > and have gcc 10 default to "gnu" (or whatever we call it), so that > people can override it back to "bytes". > > (again, I'm thinking aloud here) > > But please can you split that out as a separate patch? (it's arguably > still in time for GCC 10, as it's from a patch was posted before the > stage 1 deadline).