On Tue, 2021-11-02 at 16:58 -0400, David Malcolm wrote: > Before: > > Wbidirectional-1.c: In function ‘main’: > Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional > character detected [-Wbidirectional=] > 6 | /* } if (isAdmin) begin admins only */ > | ^ > Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional > character detected [-Wbidirectional=] > 9 | /* end admins only { */ > | ^ > > Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when > closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [- > Wbidirectional=] > 6 | int LRE__PDF_\u202c; > | ^ > > After setting rich_loc.set_escape_on_output (true): > > Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional > character detected [-Wbidirectional=] > 6 | /*<U+202E> } <U+2066>if (isAdmin)<U+2069> <U+2066> > begin admins only */ > > | > ^ > Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional > character detected [-Wbidirectional=] > 9 | /* end admins only <U+202E> { <U+2066>*/ > | ^ > > Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when > closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [- > Wbidirectional=] > 6 | int LRE_<U+202A>_PDF_\u202c; > | ^ > > libcpp/ChangeLog: > * lex.c (maybe_warn_bidi_on_close): Use a rich_location > and call set_escape_on_output (true) on it. > (maybe_warn_bidi_on_char): Likewise. > > Signed-off-by: David Malcolm <dmalc...@redhat.com>
[...snip...] To be more explicit: part of the benefit of escaping non-ASCII bytes in the source line is that it further mitigates against CVE-2021-42574, since it "defangs" the bidi control characters - turning everything into ASCII, so that the user can see the logical ordering of the characters directly. A similar consideration applies to homoglyph attacks. Dave