On Thu, Jul 23, 2020 at 05:47:28PM -0400, David Malcolm wrote: > On Thu, 2020-07-23 at 12:28 -0400, Lewis Hyatt via Gcc-patches wrote: > > Hello- > > > > The attached patch is complete including docs, but I tagged as RFC > > because I am not sure if anyone will like it, or if the general > > reaction may > > be closer to recoiling in horror :). Would appreciate your thoughts, > > please... > > Thanks for working on this. I'm interested in other people's thoughts > on this. Various comments inline throughout below. >
Thanks for the feedback! I made a few replies below as well. > Currently, if a UTF-8 locale is detected, GCC changes the quote > > characters > > it outputs in diagnostics to Unicode directional quotes. I feel like > > this is > > a nice touch, so I was wondering whether GCC shouldn't do more along > > these > > lines. This patch adds support for using Unicode line drawing > > characters and > > similar things when outputting diagnostics. There is a new option > > -fdiagnostics-unicode-drawing=[auto|never|always] to control it, > > which > > defaults to auto. "auto" will enable the feature under the same > > circumstances that Unicode quotes get output, namely when the locale > > is > > determined by gcc_init_libintl() to support UTF-8. (The new option > > does not > > affect Unicode quote characters, which currently are not configurable > > and > > are determined solely by the locale.) > > FWIW when I first started experimenting with location ranges back in > 2015 my first patches had box-drawing characters for underlines; you > can see this in some of the early examples here (and similar URLs from > around then): > > https://dmalcolm.fedorapeople.org/gcc/2015-08-18/plugin.html > (this also has a different approach for labeling ranges, which I > called "captions", putting them in a right margin) > > https://dmalcolm.fedorapeople.org/gcc/2015-08-19/diagnostic-test-string-literals-1.html > > https://dmalcolm.fedorapeople.org/gcc/2015-08-26/tree-expression-ranges.html > > etc; the patch kits were: > > https://gcc.gnu.org/legacy-ml/gcc-patches/2015-03/msg00837.html > https://gcc.gnu.org/pipermail/gcc-patches/2015-September/428036.html > https://gcc.gnu.org/legacy-ml/gcc-patches/2015-09/msg01696.html > > In: > https://gcc.gnu.org/legacy-ml/gcc-patches/2015-09/msg01700.html > I wrote: > > * Eliminated UTF-8/box-drawing and captions. Captions were cute but > > weren't "fully baked". Without them, box-drawing isn't really > > needed, and I think I prefer the ASCII look, with the actual > > "caret" character, and '~' makes it easier to count characters > > compared to a box-drawing line, in my terminal's font, at least. > > Doing so greatly simplifies the new locus-printing code. > > So I dropped the UTF-8 box drawing from that original kit for: > (a) simplicity (the original patch kit was huge in scope, covering a > bunch of ideas for diagnostics - ranges, labeling, fix-it hints, > spelling suggestions, so I wanted to reduce the scope to something > manageable) > (b) I found it easier to count characters with "~" > Oh interesting, sorry I didn't realize this had already been considered. Well perhaps it's a good time to revisit it anyway, if people find it appealing. > > The thing I'm most nervous about with this patch is the potential for > introducing mojibake when people copy and paste GCC output. > > For example, looking at: > https://gcc.gnu.org/legacy-ml/gcc-patches/2015-03/msg00837.html > I see mojibake where the unicode line-drawing characters in my email > are being displayed in the HTML mailing list archive via "â" - > something has gone wrong with encoding somewhere between the copy&paste > from my terminal, the email, and the list archive. > > That said, looking at your email in the archive here: > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550551.html > I don't see any mojibake. > > What happens if GCC's stderr is piped into "less"? > What happens if GCC's stderr is saved in a build.log file, uploaded > somewhere, and then viewed? > etc. > pipe to less should presumably be no problem, unless user goes out of their way to use different locale settings there. Transporting the data is certainly a potential area of difficulty, as is also interaction with IDEs and other tools that want to parse the data, but such tools are at least already handling the UTF-8 quote characters OK. By the way, I was wondering separately what you think about adding an option like -fplain-diagnostics or something, which would achieve basically the same thing you get in the testsuite right now (-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -fdiagnostics-urls=never) but would change as necessary whenever diagnostics evolve. It seems rather involved currently to add a new option like -fdiagnostics-unicode-drawing but keep the testsuite working, in addition to adding to prune.exp and to the libstdc++.exp, you also need to update the compat.exp so that it can figure out to pass the option only to sufficiently new compilers. With -fplain-diagnostics, this could just be part of the code change and the testsuite could stay the same; this may also make it easier on IDE type utilities since they could rely on a more stable format for the diagnostics, assuming they don't already use JSON format. > > > The elements implemented are: > > > > * Vertical lines, e.g. those indicating labels and those > > separating the > > source lines from the line numbers, are changed to line drawing > > characters. > > > > * The diagnostic paths output by the static analyzer make use of > > line > > drawing characters to output smooth corners etc. > > > > * The squiggly underline ~~~~~ used to highlight source locations > > is > > changed to a double underline ═════. The main reason for this > > is that > > it enables a seamless "tee" character to connect the underline > > to a > > label line if one exists. > > > > * Carets (^) are changed to a slightly different character (∧). I > > think > > the new one is a little nicer looking, although probably not > > worth the > > trouble on its own. I wanted to implement the support in this > > patch > > beause carets are harder to change than the rest of the > > elements > > (front ends have an interface to override them, which currently > > Fortran makes use of), so I thought it worthwhile to get this > > logic in > > place, so that it can easily be changed to a more superior > > character > > in the future if one comes up. It would also be easy enough to > > leave > > the Unicode support in place for carets, but keep the default > > set to > > the plain one for now. > > Some other ideas: > > * fix-it hints > > * maybe have a different character for separating the line numbers as > opposed to those for labels and for showing interprocedural paths. > Something like that would be easy to add, sure, perhaps a double vertical line instead: diagnostic-ranges.c:196:28: warning: field width specifier ‘*’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=] 196 ║ __builtin_sprintf (d, " %*ld ", foo + bar, foo); ║ ═∧══ ════╤════ ║ │ │ ║ int long int > > As an example, this diagnostic from gcc.dg/format/diagnostic- > > ranges.c: > > > > diagnostic-ranges.c:196:28: warning: field width specifier ‘*’ > > expects argument of type ‘int’, but argument 3 has type ‘long int’ [- > > Wformat=] > > 196 | __builtin_sprintf (d, " %*ld ", foo + bar, foo); > > | ~^~~ ~~~~~~~~~ > > | | | > > | int long int > > > > would become instead: > > > > diagnostic-ranges.c:196:28: warning: field width specifier ‘*’ > > expects argument of type ‘int’, but argument 3 has type ‘long int’ [- > > Wformat=] > > 196 │ __builtin_sprintf (d, " %*ld ", foo + bar, foo); > > │ ═∧══ ════╤════ > > │ │ │ > > │ int long int > > > > Hopefully you are viewing this in a terminal that displays it > > properly :), in > > which case, hopefully you may find it to be an improvement? > > I wonder if you can upload colorized examples somewhere? > > e.g. using bin/gcc-color-to-html.py from our website repository: > https://gcc.gnu.org/git/?p=gcc-wwwdocs.git;a=blob;f=bin/gcc-color-to-html.py > or one of the various ansi2html conversion scripts e.g. > http://www.pixelbeat.org/scripts/ansi2html.sh > Here are some .PNG screenshots, which I hope is OK for this purpose? https://drive.google.com/file/d/1MN-_eok_gwk_hl5C8DgvQGHPCpHAn1RH/view?usp=sharing https://drive.google.com/file/d/1siR-vh1osCvT8VB9uMrVuaS-pznswdBs/view?usp=sharing https://drive.google.com/file/d/1xgq0F-zhXpwOi3zJyU6EZEaQL079DJD5/view?usp=sharing ... > > > > Although probably premature, bootstrap and regtest were done on x86- > > 64 > > linux, all tests the same before/after and new tests passing: > > FAIL 96 96 > > PASS 479090 479239 > > UNSUPPORTED 11946 11946 > > UNTESTED 194 194 > > XFAIL 1839 1839 > > XPASS 36 36 > > I see the patch kit touches Fortran; was this with all frontends > enabled? (though I guess I'm likewise being premature here) > Yes, this was all languages except for jit. > > I tried to set this up as a general framework, at least, it is easy > > in one > > place to change the characters that are used for various contexts, so > > that > > if people like the general idea, but not some of the specifics, the > > patch is > > easily modified for that now or in the future. Thanks for any > > feedback! > > Thanks again for the patch; let's see what others think. > Dave > Thanks for the feedback. If there is interest in something like this, I'm happy to implement any suggestions. -Lewis