cor3ntin added a comment.

In D105759#4456864 <https://reviews.llvm.org/D105759#4456864>, @aaron.ballman 
wrote:

> I don't think it's correct to assume that all string arguments to attributes 
> are unevaluated, but it is hard to tell where to draw the line sometimes. 
> Backing up a step, as I understand P2361 <https://reviews.llvm.org/P2361>, an 
> unevaluated string is one which is not converted into the execution character 
> set (effectively). Is that correct? If so, then as an example, 
> `[[clang::annotate()]]` should almost certainly be using an evaluated string 
> because the argument is passed down to LLVM IR and is used in ways we cannot 
> predict. What's more, an unevaluated string cannot have some kinds of escape 
> characters (numeric and conditional escape sequences) and those are currently 
> allowed by `clang::annotate` and could potentially be used by a backend 
> plugin.
>
> I think other attributes may have similar issues. For example, the `alias` 
> attribute is a bit of a question mark for me -- that takes a string literal 
> representing an external identifier that is looked up. I'm not certain 
> whether that should be in the execution character set or not, but we do 
> support escape sequences for it: https://godbolt.org/z/v65Yd7a68
>
> I think we need to track evaluated vs not on the argument level so that the 
> attributes in Attr.td can decide which form to use. I think we should default 
> to "evaluated" for any attribute we're on the fence about because that's the 
> current behavior they get today (so we should avoid regressions).

I really don't think it makes sense to have both "unevaluated" and "evaluated" 
arguments.
We chatted offline and we struggle to find places where escape sequences are 
used, or examples of attributes intended to be in the execution character set.

My suggestion would be to land the non-attributes changes now, and the 
attributes bits in early clang 18.
If we find clear example of attributes expecting execution character set, they 
should be able to be described as an expression, which will be checked as a 
string literal anyway, hopefully?

In the case of annotate, if these are fed, for example to a debugger, their may 
need to convert to whatever the debugger expect as encoding, which is not 
necessarily the execution charset,
Same for plugins, they certainly not expect ebcdic data, for example.
I would expect for example static analyzers and code generator to keep working 
after the introduction of fexec-charset
So it's important that it remains unevaluated in the front end so that it can 
be correctly converted to the appropriate encoding of the various consumers. 
Which doesn't have a single answer

> Do we know of any attributes in the "needs more thinking" list that should 
> have the string literal encoded in the execution character set? I think most 
> of these are for referring to identifiers in source and I expect those would 
> want source character set and not execution character set strings.

Identifiers and symbol names are in UTF8, and may get mangle through, for 
example replacing non-ascii codepoints by UCN. The source character set is 
never relevant
This address the WebAsm attributes

> BTFDeclTag/BTFTypeTag (is emitted to DWARF with -g so probably evaluated?)

Is it correct to assume the debugger file encoding is always the same as the 
program's ? Probably not!
If need be, we can then transcode the strings when doing codegen for these 
things


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to