[clang] [llvm] [StrTable] Switch intrinsics to StringTable and work around MSVC (PR #123548)

Chandler Carruth via cfe-commits Sat, 25 Jan 2025 22:54:00 -0800

================
@@ -51,28 +57,71 @@ class StringToOffsetTable {
     return II->second;
   }
 
-  // Emit the string using string literal concatenation, for better readability
-  // and searchability.
-  void EmitStringLiteralDef(raw_ostream &OS, const Twine &Decl,
-                            const Twine &Indent = "  ") const {
+  // Emit a string table definition with the provided name and indent.
+  //
+  // When possible, this uses string-literal concatenation to emit the string
+  // contents in a readable and searchable way. However, for (very) large 
string
+  // tables MSVC cannot reliably use string literals and so there we use a 
large
+  // character array. We still use a line oriented emission and add comments to
+  // provide searchability even in this case.
+  //
+  // The string table, and its input string contents, are always emitted as 
both
+  // `static` and `constexpr`. Both `Name` and (`Name` + "Storage") must be
+  // valid identifiers to declare.
+  void EmitStringTableDef(raw_ostream &OS, const Twine &Name,
+                          const Twine &Indent = "") const {
     OS << formatv(R"(
 #ifdef __GNUC__
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Woverlength-strings"
 #endif
-{0}{1} = )",
-                  Indent, Decl);
+{0}static constexpr char {1}Storage[] = )",
+                  Indent, Name);
+
+    // MSVC silently miscompiles string literals longer than 64k in some
+    // circumstances. When the string table is longer, emit it as an array of
+    // character literals.
+    bool UseChars = AggregateString.size() > (64 * 1024);
----------------
chandlerc wrote:


Yes, but AFAICT this is only used with `SequenceToOffsetTable` and not with 
`StringToOffsetTable`...

I don't know the history of why we have two offset table emission systems. But 
`StringToOffsetTable` seemed to be what was used with `IntrinsicEmitter.cpp`, 
and what I've been using in Clang as well.

I can try to consolidate the divergence that his grown between these two if 
necessary, but it's a pretty big undertaking.

For example, the place where I currently need this logic is in Clang which 
doesn't even use that CMake definition to invoke its tablegen.

To me, it seems a simpler model to just unconditionally emit tables larger than 
64k using the awkward syntax rather than add more tablegen flags that change 
behavior. This lets the vast majority of cases use the simpler form.

https://github.com/llvm/llvm-project/pull/123548
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [StrTable] Switch intrinsics to StringTable and work around MSVC (PR #123548)

Reply via email to