On Tue, Nov 30, 2021 at 04:00:01PM +0100, Stephan Bergmann wrote:
> On 30/11/2021 14:26, Marek Polacek wrote:
> > On Tue, Nov 30, 2021 at 09:38:57AM +0100, Stephan Bergmann wrote:
> > > On 15/11/2021 18:28, Marek Polacek via Gcc-patches wrote:
> > > > On Mon, Nov 08, 2021 at 04:33:43PM -0500, Marek Polacek wrote:
> > > > > Ping, can we conclude on the name?   IMHO, -Wbidirectional is just 
> > > > > fine,
> > > > > but changing the name is a trivial operation.
> > > > 
> > > > Here's a patch with a better name (suggested by Jonathan W.).  
> > > > Otherwise no
> > > > changes.
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > >   From a link below:
> > > > "An issue was discovered in the Bidirectional Algorithm in the Unicode
> > > > Specification through 14.0. It permits the visual reordering of
> > > > characters via control sequences, which can be used to craft source code
> > > > that renders different logic than the logical ordering of tokens
> > > > ingested by compilers and interpreters. Adversaries can leverage this to
> > > > encode source code for compilers accepting Unicode such that targeted
> > > > vulnerabilities are introduced invisibly to human reviewers."
> > > > 
> > > > More info:
> > > > https://nvd.nist.gov/vuln/detail/CVE-2021-42574
> > > > https://trojansource.codes/
> > > > 
> > > > This is not a compiler bug.  However, to mitigate the problem, this 
> > > > patch
> > > > implements -Wbidi-chars=[none|unpaired|any] to warn about possibly
> > > > misleading Unicode bidirectional characters the preprocessor may 
> > > > encounter.
> > > > 
> > > > The default is =unpaired, which warns about improperly terminated
> > > > bidirectional characters; e.g. a LRE without its appertaining PDF.  The
> > > > level =any warns about any use of bidirectional characters.
> > > > 
> > > > This patch handles both UCNs and UTF-8 characters.  UCNs designating
> > > > bidi characters in identifiers are accepted since r204886.  Then r217144
> > > > enabled -fextended-identifiers by default.  Extended characters in C/C++
> > > > identifiers have been accepted since r275979.  However, this patch still
> > > > warns about mixing UTF-8 and UCN bidi characters; there seems to be no
> > > > good reason to allow mixing them.
> > > 
> > > I wonder what the rationale is to warn about UCNs, like in
> > > 
> > > >          aText = u"\u202D" + aText;
> > > 
> > > (as found in the LibreOffice source code).
> > 
> > Is this line mixing a UCN and a UTF-8?  Or is it just that you're
> > prepending a LRO to aText?  We warn because the LRO is not "closed"
> > in the context of its string literal, which was part of the Trojan
> > source attack.  So "\u202D ... \u202C" would not warn.
> > 
> > I'm not sure what workaround I could offer.  Maybe provide an option not to
> > warn about UCNs at all, though even that is potentially dangerous -- while
> > you can see UCNs in the source code, if you print strings containing them,
> > they won't be visible anymore.
> 
> I'm not sure what you mean with "mixing a UCN and a UTF-8", but what the
> code apparently does is programmatically constructing a larger piece of text
> by prepending LRO to an existing piece of text.
> 
> My understanding is that Trojan Source is concerned with presentation of
> program source code and not with properties of Unicode text constructed
> during the execution of such a program, and from the documentation quoted
> above I understand that -Wbidi-chars is meant to address Trojan Source, so I
> don't understand why you're concerned here with what happens "if you print
> strings containing [UCNs in the source code]".
> 
> Short of a source code viewer that interprets UCNs in C/C++ source code and
> renders them in the same way as their corresponding Unicode characters, I
> don't think that UCNs are relevant for Trojan Source, and don't understand
> why -Wbidi-chars would warn about them.

I guess we were concerned with programs that generate other programs.
Maybe UCNs should be ignored by default.  There's still time to adjust
the behavior.
 
> (Also, I noticed that it doesn't work to silence -Werror=bidi-chars= with a
> 
> > #pragma GCC diagnostic ignored "-Wbidi-chars"

Yeah, it doesn't work with C++, it's https://gcc.gnu.org/PR53431 :(

Marek

Reply via email to