On Tue, Nov 30, 2021 at 04:00:01PM +0100, Stephan Bergmann wrote: > On 30/11/2021 14:26, Marek Polacek wrote: > > On Tue, Nov 30, 2021 at 09:38:57AM +0100, Stephan Bergmann wrote: > > > On 15/11/2021 18:28, Marek Polacek via Gcc-patches wrote: > > > > On Mon, Nov 08, 2021 at 04:33:43PM -0500, Marek Polacek wrote: > > > > > Ping, can we conclude on the name? IMHO, -Wbidirectional is just > > > > > fine, > > > > > but changing the name is a trivial operation. > > > > > > > > Here's a patch with a better name (suggested by Jonathan W.). > > > > Otherwise no > > > > changes. > > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > > > > > -- >8 -- > > > > From a link below: > > > > "An issue was discovered in the Bidirectional Algorithm in the Unicode > > > > Specification through 14.0. It permits the visual reordering of > > > > characters via control sequences, which can be used to craft source code > > > > that renders different logic than the logical ordering of tokens > > > > ingested by compilers and interpreters. Adversaries can leverage this to > > > > encode source code for compilers accepting Unicode such that targeted > > > > vulnerabilities are introduced invisibly to human reviewers." > > > > > > > > More info: > > > > https://nvd.nist.gov/vuln/detail/CVE-2021-42574 > > > > https://trojansource.codes/ > > > > > > > > This is not a compiler bug. However, to mitigate the problem, this > > > > patch > > > > implements -Wbidi-chars=[none|unpaired|any] to warn about possibly > > > > misleading Unicode bidirectional characters the preprocessor may > > > > encounter. > > > > > > > > The default is =unpaired, which warns about improperly terminated > > > > bidirectional characters; e.g. a LRE without its appertaining PDF. The > > > > level =any warns about any use of bidirectional characters. > > > > > > > > This patch handles both UCNs and UTF-8 characters. UCNs designating > > > > bidi characters in identifiers are accepted since r204886. Then r217144 > > > > enabled -fextended-identifiers by default. Extended characters in C/C++ > > > > identifiers have been accepted since r275979. However, this patch still > > > > warns about mixing UTF-8 and UCN bidi characters; there seems to be no > > > > good reason to allow mixing them. > > > > > > I wonder what the rationale is to warn about UCNs, like in > > > > > > > aText = u"\u202D" + aText; > > > > > > (as found in the LibreOffice source code). > > > > Is this line mixing a UCN and a UTF-8? Or is it just that you're > > prepending a LRO to aText? We warn because the LRO is not "closed" > > in the context of its string literal, which was part of the Trojan > > source attack. So "\u202D ... \u202C" would not warn. > > > > I'm not sure what workaround I could offer. Maybe provide an option not to > > warn about UCNs at all, though even that is potentially dangerous -- while > > you can see UCNs in the source code, if you print strings containing them, > > they won't be visible anymore. > > I'm not sure what you mean with "mixing a UCN and a UTF-8", but what the > code apparently does is programmatically constructing a larger piece of text > by prepending LRO to an existing piece of text. > > My understanding is that Trojan Source is concerned with presentation of > program source code and not with properties of Unicode text constructed > during the execution of such a program, and from the documentation quoted > above I understand that -Wbidi-chars is meant to address Trojan Source, so I > don't understand why you're concerned here with what happens "if you print > strings containing [UCNs in the source code]". > > Short of a source code viewer that interprets UCNs in C/C++ source code and > renders them in the same way as their corresponding Unicode characters, I > don't think that UCNs are relevant for Trojan Source, and don't understand > why -Wbidi-chars would warn about them.
I guess we were concerned with programs that generate other programs. Maybe UCNs should be ignored by default. There's still time to adjust the behavior. > (Also, I noticed that it doesn't work to silence -Werror=bidi-chars= with a > > > #pragma GCC diagnostic ignored "-Wbidi-chars" Yeah, it doesn't work with C++, it's https://gcc.gnu.org/PR53431 :( Marek