dang added a comment.

I made the decision to make all relative offsets relative to the start of the 
AST block rather than their own sub-block or a neighboring block, in order to 
prevent adding complexity in `ASTReader`. My reasoning that it would be best if 
the reader didn't have to keep track of a bunch of offsets inside the bitcode 
at all times, but that it could just remember the offset of the AST block once 
and always use that for relative offsets. I agree that conceptually making the 
sub-blocks relocatable too would have been nice, but I figured it wasn't worth 
the extra complexity in the reader.



================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1084
       PP.getHeaderSearchInfo().getHeaderSearchOpts().ModulesHashContent) {
-    Signature = createSignature(StringRef(Buffer.begin(), 
StartOfUnhashedControl));
+    Record.append(ASTSignature.begin(), ASTSignature.end());
+    Stream.EmitRecord(AST_SIGNATURE, Record);
----------------
dexonsmith wrote:
> The default abbreviation (7-VBR) isn't great for hashes. Given that we're 
> going to have two of these records, I suggest specifying a custom 
> abbreviation.
> 
> If you were to split out a prep commit as suggested above to change 
> `SIGNATURE` to be an array of 8-bit values (instead of the current array of 
> 32-bit values), that would be a good opportunity to add an abbreviation.
I think encoding the signature as a blob makes the most sense if we make it an 
array of 20 8-bit values, which is what the hasher gives us. However it is not 
quite so simple, as it would need a big change in the way direct imports are 
serialized. The way it is currently done embeds multiple strings and signatures 
within a single unabbreviated record. I can replace the unabbreviated record 
with a sub-block that keeps track of a string table and a record for each 
import containing offsets inside the string table and blobs for the signature.


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:4729-4733
+        StringRef Name = M.Kind == MK_PrebuiltModule ||
+                                 M.Kind == MK_ExplicitModule ||
+                                 M.Kind == MK_ImplicitModule
+                             ? M.ModuleName
+                             : M.FileName;
----------------
dexonsmith wrote:
> Is there ever a reason to use `M.FileName` here, or is that always redundant 
> with what's in the control block? I wonder if we can just remove this 
> complexity.
> 
The control block only tracks this information for direct imports although it 
can maybe be extended to do keep track of this for all dependencies. This bit 
of the table could then become and index inside the `IMPORTS` record in the 
control block.


================
Comment at: clang/lib/Serialization/ASTWriterDecl.cpp:2437
   if (DeclOffsets.size() == Index)
-    DeclOffsets.emplace_back(Loc, Offset);
+    DeclOffsets.emplace_back(Loc, Offset, ASTBlockRange.first);
   else if (DeclOffsets.size() < Index) {
----------------
dexonsmith wrote:
> I suggest making `WriteDecls` relative to the start of the 
> `DECLTYPES_BLOCK_ID`. That will make the block itself (more) self-contained, 
> and also make the offsets smaller (and therefore cheaper to store in bitcode).
> 
> I also don't think you need to actually emit that integer, it's just an 
> implicit contract between the reader and writer.
Not sure what you mean by I don't actually need to emit that integer? If you 
mean `ASTBlockRange.first` then I don't emit it but rather pass it to the 
constructor of DeclOffset so that it performs the subtraction there. Since it 
is a self contained type, it forces the Reader to provide a value for it, 
making it harder to read the raw relative offset.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80383/new/

https://reviews.llvm.org/D80383



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to