dexonsmith requested changes to this revision. dexonsmith added a comment. This revision now requires changes to proceed.
Thanks for working on this! I have a number of suggestions inline. ================ Comment at: clang/include/clang/Serialization/ASTBitCodes.h:44 /// AST files at this time. const unsigned VERSION_MAJOR = 10; ---------------- `VERSION_MAJOR` should be bumped. ================ Comment at: clang/include/clang/Serialization/ASTBitCodes.h:245 /// Offset in the AST file. Keep structure alignment 32-bit and avoid /// padding gap because undefined value in the padding affects AST hash. ---------------- Please update the first sentence to be accurate. ================ Comment at: clang/include/clang/Serialization/ASTBitCodes.h:247 /// padding gap because undefined value in the padding affects AST hash. + /// This value is relative to the offset of the AST block UnderalignedInt64 BitOffset; ---------------- I don't think you'll need this comment anymore, after you fix the above. But if you keep it, please add a period at the end of the sentence. ================ Comment at: clang/include/clang/Serialization/ASTBitCodes.h:396-400 /// Record code for the signature that identifiers this AST file. SIGNATURE = 1, + /// Record code for the signature of the AST block. + AST_SIGNATURE, ---------------- These names and descriptions hard hard to differentiate. Is there another way of naming these that will be more clear? (One idea I had is to create `CONTROL_BLOCK_HASH` and `AST_BLOCK_HASH` and then `SIGNATURE` could just be their hash-combine, but maybe you have another idea.) ================ Comment at: clang/lib/Serialization/ASTReader.cpp:2931 BitstreamCursor &Stream = F.Stream; - if (llvm::Error Err = Stream.EnterSubBlock(AST_BLOCK_ID)) { ---------------- Please submit this NFC change separately. ================ Comment at: clang/lib/Serialization/ASTWriter.cpp:1042-1046 + ASTFileSignature ASTSignature; + for (int I = 0; I != 5; ++I) + ASTSignature[I] = LShift(Hash[I * 4 + 0], 24) | + LShift(Hash[I * 4 + 1], 16) | LShift(Hash[I * 4 + 2], 8) | + LShift(Hash[I * 4 + 3], 0); ---------------- Rather than duplicating this code, would it be simpler to start with a patch that changes `ASTFileSignature` to not require this massaging? I suggest either adding a constructor that takes `Hash` directly, or just changing it to have the same type as `Hash` (and updating the serialization to expect that). ================ Comment at: clang/lib/Serialization/ASTWriter.cpp:1061 - return Signature; + return std::make_pair(ASTSignature, Signature); } ---------------- Alternately, you could change `ASTFileSignature` to have three fields: - ControlBlockHash - ASTBlockHash - Signature (You could construct the signature by taking the SHA1 of the two block hashes.) ================ Comment at: clang/lib/Serialization/ASTWriter.cpp:1084 PP.getHeaderSearchInfo().getHeaderSearchOpts().ModulesHashContent) { - Signature = createSignature(StringRef(Buffer.begin(), StartOfUnhashedControl)); + Record.append(ASTSignature.begin(), ASTSignature.end()); + Stream.EmitRecord(AST_SIGNATURE, Record); ---------------- The default abbreviation (7-VBR) isn't great for hashes. Given that we're going to have two of these records, I suggest specifying a custom abbreviation. If you were to split out a prep commit as suggested above to change `SIGNATURE` to be an array of 8-bit values (instead of the current array of 32-bit values), that would be a good opportunity to add an abbreviation. ================ Comment at: clang/lib/Serialization/ASTWriter.cpp:2078 SourceMgr.getNextLocalOffset() - 1 /* skip dummy */, - SLocEntryOffsetsBase}; + SLocEntryOffsetsBase - ASTBlockRange.first}; Stream.EmitRecordWithBlob(SLocOffsetsAbbrev, Record, ---------------- Can this be relative to the start of `SOURCE_MANAGER_BLOCK_ID` instead? ================ Comment at: clang/lib/Serialization/ASTWriter.cpp:4729-4733 + StringRef Name = M.Kind == MK_PrebuiltModule || + M.Kind == MK_ExplicitModule || + M.Kind == MK_ImplicitModule + ? M.ModuleName + : M.FileName; ---------------- Is there ever a reason to use `M.FileName` here, or is that always redundant with what's in the control block? I wonder if we can just remove this complexity. ================ Comment at: clang/lib/Serialization/ASTWriterDecl.cpp:2437 if (DeclOffsets.size() == Index) - DeclOffsets.emplace_back(Loc, Offset); + DeclOffsets.emplace_back(Loc, Offset, ASTBlockRange.first); else if (DeclOffsets.size() < Index) { ---------------- I suggest making `WriteDecls` relative to the start of the `DECLTYPES_BLOCK_ID`. That will make the block itself (more) self-contained, and also make the offsets smaller (and therefore cheaper to store in bitcode). I also don't think you need to actually emit that integer, it's just an implicit contract between the reader and writer. ================ Comment at: clang/lib/Serialization/ASTWriterDecl.cpp:2442 DeclOffsets[Index].setLocation(Loc); - DeclOffsets[Index].setBitOffset(Offset); + DeclOffsets[Index].setBitOffset(Offset, ASTBlockRange.first); } else { ---------------- Same here, this should be relative to the start of `DECLTYPES_BLOCK_ID`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80383/new/ https://reviews.llvm.org/D80383 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits