Issue 97741
Summary -Winvalid-token-paste fails to catch UCNs which are invalid preprocessor tokens
Labels new issue
Assignees
Reporter jeffgarrett
    Consider ([godbolt link](https://godbolt.org/z/c1sGTM6WK)):

```cpp
#define X \\

#define U(x) x ## u0000
#define Y(x) U(x)
#define Z(x) #x
#define W(x) Z(x)

const char str[] = W(Y(X));
```

clang preprocesses it to `const char str[] = "\u0000";` but the UCN `\u0000` is an invalid as a preprocessor token per [\[lex.pptoken/2\]](https://eel.is/c++draft/lex.pptoken#2). This is conforming because this is preprocessor UB per [\[cpp.concat/3\]](https://eel.is/c++draft/cpp.concat#3).

(At least that's how I interpret it... The former allows it in the grammar production, and declares it ill-formed if the production would be matched. Is a character matching this production "valid" as a preprocessing token, as it is used in the latter? I think it could also be read that this is a valid preprocessing token and can thus be produced fleetingly by pasting but would be ill-formed if written directly.)

There is divergence. The godbolt link shows gcc gives the same error as it gives if one directly writes the UCN in the source.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to