================
@@ -51,28 +57,71 @@ class StringToOffsetTable {
     return II->second;
   }
 
-  // Emit the string using string literal concatenation, for better readability
-  // and searchability.
-  void EmitStringLiteralDef(raw_ostream &OS, const Twine &Decl,
-                            const Twine &Indent = "  ") const {
+  // Emit a string table definition with the provided name and indent.
+  //
+  // When possible, this uses string-literal concatenation to emit the string
+  // contents in a readable and searchable way. However, for (very) large 
string
+  // tables MSVC cannot reliably use string literals and so there we use a 
large
+  // character array. We still use a line oriented emission and add comments to
+  // provide searchability even in this case.
+  //
+  // The string table, and its input string contents, are always emitted as 
both
+  // `static` and `constexpr`. Both `Name` and (`Name` + "Storage") must be
+  // valid identifiers to declare.
+  void EmitStringTableDef(raw_ostream &OS, const Twine &Name,
+                          const Twine &Indent = "") const {
     OS << formatv(R"(
 #ifdef __GNUC__
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Woverlength-strings"
 #endif
-{0}{1} = )",
-                  Indent, Decl);
+{0}static constexpr char {1}Storage[] = )",
+                  Indent, Name);
+
+    // MSVC silently miscompiles string literals longer than 64k in some
+    // circumstances. When the string table is longer, emit it as an array of
+    // character literals.
+    bool UseChars = AggregateString.size() > (64 * 1024);
----------------
chandlerc wrote:

> I believe the long (64K+) character table literals have previously been 
> identified as a compile time bottleneck, see 
> [reviews.llvm.org/D73044](https://reviews.llvm.org/D73044), which is mostly 
> about readability, but it does mention compile time as a concern. The way 
> this is written now, it seems like Linux developers are going to pay the cost 
> to parse these long tables character by character, which I think is worth 
> avoiding. I think it's just a matter of moving the existing 
> `--long-string-literals` flag from llvm/utils/TableGen/Basic/TableGen.cpp to 
> somewhere in the TableGen library and checking it here.

To be clear, until August of last year, we were *unconditionally* doing it 
here. That was removed without any discussion of compile times: 
https://github.com/llvm/llvm-project/commit/381405fafe9d48d29c777e7680902d0943834859
 and https://github.com/llvm/llvm-project/pull/105445

Is your concern that since August, users will have adapted to the better 
compile times?

Enabling the flag is more than just moving the code, it will also require 
updating various CMake usages of TableGen to set the flag correctly.

https://github.com/llvm/llvm-project/pull/123548
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to