aaron.ballman added a comment.

In D105759#4456864 <https://reviews.llvm.org/D105759#4456864>, @aaron.ballman 
wrote:

> I don't think it's correct to assume that all string arguments to attributes 
> are unevaluated, but it is hard to tell where to draw the line sometimes. 
> Backing up a step, as I understand P2361 <https://reviews.llvm.org/P2361>, an 
> unevaluated string is one which is not converted into the execution character 
> set (effectively). Is that correct? If so, then as an example, 
> `[[clang::annotate()]]` should almost certainly be using an evaluated string 
> because the argument is passed down to LLVM IR and is used in ways we cannot 
> predict. What's more, an unevaluated string cannot have some kinds of escape 
> characters (numeric and conditional escape sequences) and those are currently 
> allowed by `clang::annotate` and could potentially be used by a backend 
> plugin.
>
> I think other attributes may have similar issues. For example, the `alias` 
> attribute is a bit of a question mark for me -- that takes a string literal 
> representing an external identifier that is looked up. I'm not certain 
> whether that should be in the execution character set or not, but we do 
> support escape sequences for it: https://godbolt.org/z/v65Yd7a68

I took a quick pass over our existing attributes, and here's my intuition on 
them regarding encoding of the literal:

Unevaluated Strings are Fine:
AbiTag
TLSModel
Availability
Deprecated
EnableIf/DiagnoseIf
ObjCRuntimeName
PragmaClangBSSSection/PragmaClangDataSection/PragmaClangRodataSection/PragmaClangRelroSection/PragmaClangTextSection
 (only created implicitly)
Suppress
Target/TargetVersion/TargetClones
Unavailable
Uuid
WarnUnusedResult
NoSanitize
Capability
Assumption
NoBuiltin (it names a builtin name, so this is probably fine to leave 
unevaluated?)
AcquireHandle/UseHandle/ReleaseHandle
Error
HLSLResourceBinding

Unevaluated String are Potentially Bad:
Annotate
AnnotateType

Unevaluated String Needs More Thinking (common thread is that they survive to 
LLVM IR):
Alias
AsmLabel
IFunc
BTFDeclTag/BTFTypeTag (is emitted to DWARF with -g so probably evaluated?)
WebAssemblyExportName/WebAssemblyImportModule/WebAssemblyImportModule
ExternalSourceSymbol
SwiftAsyncName/SwiftAttr/SwiftBridge/SwiftName
Section/CodeSeg/InitSeg
WeakRef
EnforceTCB/EnforceTCBLeaf

There's also the escape sequences issue where use of an escape sequence will go 
from accepted to rejected in these contexts. I did some hunting to see if I 
could find uses of numeric escape sequences in asm labels or alias attributes, 
to see if there's some signs this is done in practice:

Testing we can find numeric escape sequences at all:
https://sourcegraph.com/search?q=context:global+%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find asm labels at all:
https://sourcegraph.com/search?q=context:global+asm%5C%28%5C%22%5B%5B:alpha:%5D%5D*%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find asm labels with numeric escapes:
https://sourcegraph.com/search?q=context:global+asm%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find alias attributes at all:
https://sourcegraph.com/search?q=context:global+alias%5C%28%5C%22%5B%5B:alpha:%5D%5D*%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find alias attributes with numeric escapes:
https://sourcegraph.com/search?q=context:global+alias%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

I think this leaves me with three open questions:

- Do we know of any uses of the `annotate` attribute that rely on the string 
literal being in the execution character set? I do not know of any but I know 
this is used by plugins quite often.
- Do we know of any attributes in the "needs more thinking" list that should 
have the string literal encoded in the execution character set? I think most of 
these are for referring to identifiers in source and I expect those would want 
source character set and not execution character set strings.
- Do we know of any significant body of code using numeric escape sequences in 
these string literals that could not be relatively easily modified to compile 
again? I would be surprised, but I think someone should probably run more of 
the attributes on the "needs more thinking" list through similar searches on 
source graph and we can use that as an approximation.

If all these answers come back "no" as best we can figure, then I think we can 
punt on argument-level handling of this until we add an attribute that really 
does need an execution-encoded (or numeric escape sequence-using) string 
literal. I think we've got enough time before the Clang 17 branch to hear if 
the changes cause problems after we've done this due diligence. WDYT?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105759/new/

https://reviews.llvm.org/D105759

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to