[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Corentin Jabot via Phabricator via cfe-commits Wed, 25 Aug 2021 15:47:03 -0700

cor3ntin added inline comments.


================
Comment at: clang/lib/Parse/ParseDecl.cpp:422-423
 
-        ExprResult ArgExpr(
-            Actions.CorrectDelayedTyposInExpr(ParseAssignmentExpression()));
+        ExprResult ArgExpr(Actions.CorrectDelayedTyposInExpr(
+            ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind)));
         if (ArgExpr.isInvalid()) {
----------------
aaron.ballman wrote:
> cor3ntin wrote:
> > aaron.ballman wrote:
> > > Hmmm, I'm not certain about these changes.
> > > 
> > > For some attributes, the standard currently requires accepting any kind 
> > > of string literal (like `[[deprecated]]` 
> > > https://eel.is/c++draft/dcl.attr.deprecated#1). P2361 is proposing to 
> > > change that, but it's not yet accepted by WG21 (let alone WG14). So 
> > > giving errors in those cases is a bit of a hard sell -- I think warnings 
> > > would be a bit more reasonable.
> > > 
> > > But for other attributes (like `annotate`), it's a bit less clear whether 
> > > we should *prevent* literal prefixes because the attribute can be used to 
> > > have runtime impacts (for example, I can imagine someone using `annotate` 
> > > to emit the string literal bytes into the resulting binary). In some 
> > > cases, I think it's very reasonable (e.g., `diagnose_if` should behave 
> > > the same as `deprecated` and `nodiscard` because those are purely about 
> > > generating diagnostics at compile time).
> > > 
> > > I kind of wonder whether we're going to want to tablegenerate whether the 
> > > argument needs to be parsed as unevaluated or not on an 
> > > attribute-by-attribute basis.
> > Yep, I would not expect this to get merge before P2361 but I think the 
> > implementation experience is useful and raised a bunch of good questions.
> > I don't think it ever makes sense to have `L` outside of literals - but 
> > people *might* do it currently, in which case there is a concern about 
> > whether it breaks code (I have found no evidence of that though).
> > 
> > If we wanted to inject these strings in the binary - in some form, then we 
> > might have to transcode them at that point.
> > I don't think the user would know if the string would be injected as wide 
> > or narrow (or something else) by the compiler.
> > 
> > `L` is really is want to convert that string _at that point_. in an 
> > attribute, strings might have multiple usages so it's better to delay any 
> > transcoding.
> > Does that make sense?
> > 
> > But I agree that a survey of what each attribute expects is in order.
> > 
> > 
> > 
> > Yep, I would not expect this to get merge before P2361 but I think the 
> > implementation experience is useful and raised a bunch of good questions.
> 
> Absolutely agreed, this is worthwhile effort!
> 
> > If we wanted to inject these strings in the binary - in some form, then we 
> > might have to transcode them at that point.
> > I don't think the user would know if the string would be injected as wide 
> > or narrow (or something else) by the compiler.
> 
> My intuition is that a user who writes `L"foo"` will expect a wide `"foo"` to 
> appear in the binary in the cases where the string ends up making it that far.
> 
> > L is really is want to convert that string _at that point_. in an 
> > attribute, strings might have multiple usages so it's better to delay any 
> > transcoding.
> > Does that make sense?
> 
> Not yet, but I might get there eventually. :-D My concern is that vendor 
> attributes can basically do anything, so there's no reason to assume that any 
> given string literal usage should or should not transcode. I think we have to 
> decide on a case by case basis by letting the attributes specify what they 
> intend in their argument lists. However, my intuition is that *most* 
> attributes will expect unevaluated string literals because the string 
> argument doesn't get passed to LLVM.
The status quo is that everything transcodes.

But not transcoding, we do not destroy any information as to what is in the 
source.

If an attribute then wants to use the string later in such a way that it needs 
to transcode to a literal encoding (or something else, for example, one might 
imagine a fun scenario where literal are ASCII encoded and debug information 
are EBCDIC encoded), then that can be done, because the string still exists.

Whereas for literal specifically, we assume they will be evaluated by the 
abstract machine as per phase 5 so we transcode them immediately. which is 
destructive. we get away with it because the original spelling is in the source 
if we need it, and currently, literals are also assumed to be (potentially 
invalid because of `\x` escape sequences) UTF-8.

There is an alternative design where string literals are not transcoded until 
lazily evaluated but I'm not sure there is a big motivation for that.

So this PR is exactly trying not to force a specific behavior on attributes 
that I assume can be displayed, put into some form in the binary, or converted 
to literal which might represent 3 distinct encodings. The parser leaving them 
as Unicode is the least opinionated thing the parser can possibly do.
And then each attribute can decide for itself if it needs to transcode, and how 
to handle any errors if they occur.
An attribute might decide to keep both a Unicode and non-Unicode spelling 
around if the string has a dual purpose, etc


Question though, Is there a scenario in which `\x`/`\0` would actually be 
useful in the context of attributes? Because if so, then we might need to do 
something to allow that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105759: [WIP] Implement P2361 Unevaluated string literals

Reply via email to