On Thu, 2020-07-23 at 12:28 -0400, Lewis Hyatt via Gcc-patches wrote: > Hello- > > The attached patch is complete including docs, but I tagged as RFC > because I am not sure if anyone will like it, or if the general > reaction may > be closer to recoiling in horror :). Would appreciate your thoughts, > please...
Thanks for working on this. I'm interested in other people's thoughts on this. Various comments inline throughout below. Currently, if a UTF-8 locale is detected, GCC changes the quote > characters > it outputs in diagnostics to Unicode directional quotes. I feel like > this is > a nice touch, so I was wondering whether GCC shouldn't do more along > these > lines. This patch adds support for using Unicode line drawing > characters and > similar things when outputting diagnostics. There is a new option > -fdiagnostics-unicode-drawing=[auto|never|always] to control it, > which > defaults to auto. "auto" will enable the feature under the same > circumstances that Unicode quotes get output, namely when the locale > is > determined by gcc_init_libintl() to support UTF-8. (The new option > does not > affect Unicode quote characters, which currently are not configurable > and > are determined solely by the locale.) FWIW when I first started experimenting with location ranges back in 2015 my first patches had box-drawing characters for underlines; you can see this in some of the early examples here (and similar URLs from around then): https://dmalcolm.fedorapeople.org/gcc/2015-08-18/plugin.html (this also has a different approach for labeling ranges, which I called "captions", putting them in a right margin) https://dmalcolm.fedorapeople.org/gcc/2015-08-19/diagnostic-test-string-literals-1.html https://dmalcolm.fedorapeople.org/gcc/2015-08-26/tree-expression-ranges.html etc; the patch kits were: https://gcc.gnu.org/legacy-ml/gcc-patches/2015-03/msg00837.html https://gcc.gnu.org/pipermail/gcc-patches/2015-September/428036.html https://gcc.gnu.org/legacy-ml/gcc-patches/2015-09/msg01696.html In: https://gcc.gnu.org/legacy-ml/gcc-patches/2015-09/msg01700.html I wrote: > * Eliminated UTF-8/box-drawing and captions. Captions were cute but > weren't "fully baked". Without them, box-drawing isn't really > needed, and I think I prefer the ASCII look, with the actual > "caret" character, and '~' makes it easier to count characters > compared to a box-drawing line, in my terminal's font, at least. > Doing so greatly simplifies the new locus-printing code. So I dropped the UTF-8 box drawing from that original kit for: (a) simplicity (the original patch kit was huge in scope, covering a bunch of ideas for diagnostics - ranges, labeling, fix-it hints, spelling suggestions, so I wanted to reduce the scope to something manageable) (b) I found it easier to count characters with "~" The thing I'm most nervous about with this patch is the potential for introducing mojibake when people copy and paste GCC output. For example, looking at: https://gcc.gnu.org/legacy-ml/gcc-patches/2015-03/msg00837.html I see mojibake where the unicode line-drawing characters in my email are being displayed in the HTML mailing list archive via "â" - something has gone wrong with encoding somewhere between the copy&paste from my terminal, the email, and the list archive. That said, looking at your email in the archive here: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550551.html I don't see any mojibake. What happens if GCC's stderr is piped into "less"? What happens if GCC's stderr is saved in a build.log file, uploaded somewhere, and then viewed? etc. > The elements implemented are: > > * Vertical lines, e.g. those indicating labels and those > separating the > source lines from the line numbers, are changed to line drawing > characters. > > * The diagnostic paths output by the static analyzer make use of > line > drawing characters to output smooth corners etc. > > * The squiggly underline ~~~~~ used to highlight source locations > is > changed to a double underline ═════. The main reason for this > is that > it enables a seamless "tee" character to connect the underline > to a > label line if one exists. > > * Carets (^) are changed to a slightly different character (∧). I > think > the new one is a little nicer looking, although probably not > worth the > trouble on its own. I wanted to implement the support in this > patch > beause carets are harder to change than the rest of the > elements > (front ends have an interface to override them, which currently > Fortran makes use of), so I thought it worthwhile to get this > logic in > place, so that it can easily be changed to a more superior > character > in the future if one comes up. It would also be easy enough to > leave > the Unicode support in place for carets, but keep the default > set to > the plain one for now. Some other ideas: * fix-it hints * maybe have a different character for separating the line numbers as opposed to those for labels and for showing interprocedural paths. > As an example, this diagnostic from gcc.dg/format/diagnostic- > ranges.c: > > diagnostic-ranges.c:196:28: warning: field width specifier ‘*’ > expects argument of type ‘int’, but argument 3 has type ‘long int’ [- > Wformat=] > 196 | __builtin_sprintf (d, " %*ld ", foo + bar, foo); > | ~^~~ ~~~~~~~~~ > | | | > | int long int > > would become instead: > > diagnostic-ranges.c:196:28: warning: field width specifier ‘*’ > expects argument of type ‘int’, but argument 3 has type ‘long int’ [- > Wformat=] > 196 │ __builtin_sprintf (d, " %*ld ", foo + bar, foo); > │ ═∧══ ════╤════ > │ │ │ > │ int long int > > Hopefully you are viewing this in a terminal that displays it > properly :), in > which case, hopefully you may find it to be an improvement? I wonder if you can upload colorized examples somewhere? e.g. using bin/gcc-color-to-html.py from our website repository: https://gcc.gnu.org/git/?p=gcc-wwwdocs.git;a=blob;f=bin/gcc-color-to-html.py or one of the various ansi2html conversion scripts e.g. http://www.pixelbeat.org/scripts/ansi2html.sh > Here is a more involved example from the analyzer: > > setjmp-5.c: In function ‘outer’: > setjmp-5.c:21:3: warning: ‘longjmp’ called after enclosing function > of ‘setjmp’ has returned [-Wanalyzer-stale-setjmp-buffer] > 21 | longjmp (env, 42); /* { dg-warning "'longjmp' called after > enclosing function of 'setjmp' has returned" } */ > | ^~~~~~~~~~~~~~~~~ > ‘outer’: events 1-2 > | > | 15 | void outer (void) > | | ^~~~~ > | | | > | | (1) entry to ‘outer’ > |...... > | 19 | inner (); > | | ~~~~~~~~ > | | | > | | (2) calling ‘inner’ from ‘outer’ > | > +--> ‘inner’: event 3 > | > | 10 | static void inner (void) > | | ^~~~~ > | | | > | | (3) entry to ‘inner’ > | > ‘inner’: event 4 > | > | 12 | SETJMP (env); > | | ^~~~~~ > | | | > | | (4) ‘setjmp’ called here > | > <------+ > | > ‘outer’: events 5-6 > | > | 19 | inner (); > | | ^~~~~~~~ > | | | > | | (5) returning to ‘outer’ from ‘inner’ > | 20 | > | 21 | longjmp (env, 42); /* { dg-warning "'longjmp' called > after enclosing function of 'setjmp' has returned" } */ > | | ~~~~~~~~~~~~~~~~~ > | | | > | | (6) here > | > > would become instead: > > setjmp-5.c: In function ‘outer’: > setjmp-5.c:21:3: warning: ‘longjmp’ called after enclosing function > of ‘setjmp’ has returned [-Wanalyzer-stale-setjmp-buffer] > 21 │ longjmp (env, 42); /* { dg-warning "'longjmp' called after > enclosing function of 'setjmp' has returned" } */ > │ ∧════════════════ > ‘outer’: events 1-2 > │ > │ 15 │ void outer (void) > │ │ ∧════ > │ │ │ > │ │ (1) entry to ‘outer’ > │...... > │ 19 │ inner (); I wonder if there's a fancier way to express the gap in the lines if Unicode is available? > │ │ ╤═══════ > │ │ │ > │ │ (2) calling ‘inner’ from ‘outer’ > │ > └──> ‘inner’: event 3 > │ > │ 10 │ static void inner (void) > │ │ ∧════ > │ │ │ > │ │ (3) entry to ‘inner’ > │ > ‘inner’: event 4 > │ > │ 12 │ SETJMP (env); > │ │ ∧═════ > │ │ │ > │ │ (4) ‘setjmp’ called here > │ Unrelated to this patch , but it would be nice if the analyzer inserted an event at the function end showing the frame in "env" becoming invalid, since that's what pertinent to the diagnostic. > ┌<─────┘ > │ > ‘outer’: events 5-6 > │ > │ 19 │ inner (); > │ │ ∧═══════ > │ │ │ > │ │ (5) returning to ‘outer’ from ‘inner’ > │ 20 │ > │ 21 │ longjmp (env, 42); /* { dg-warning "'longjmp' called > after enclosing function of 'setjmp' has returned" } */ > │ │ ╤════════════════ > │ │ │ > │ │ (6) here > │ FWIW I experimented with using unicode circled number characters in place of (1), (2), etc for events in diagnostic_paths but the results looked bad in my terminal, so I stuck to the ASCII form above. In my more adventurous moments I've been tempted to use background colorization to show the stack pushes and pops in a flamegraph-style way, but I suspect it would garish and be too "busy" visually. > > Although probably premature, bootstrap and regtest were done on x86- > 64 > linux, all tests the same before/after and new tests passing: > FAIL 96 96 > PASS 479090 479239 > UNSUPPORTED 11946 11946 > UNTESTED 194 194 > XFAIL 1839 1839 > XPASS 36 36 I see the patch kit touches Fortran; was this with all frontends enabled? (though I guess I'm likewise being premature here) > I tried to set this up as a general framework, at least, it is easy > in one > place to change the characters that are used for various contexts, so > that > if people like the general idea, but not some of the specifics, the > patch is > easily modified for that now or in the future. Thanks for any > feedback! Thanks again for the patch; let's see what others think. Dave