https://github.com/jansvoboda11 created 
https://github.com/llvm/llvm-project/pull/112452

Clang uses timestamp files to track the last time an implicitly-built PCM file 
was verified to be up-to-date with regard to its inputs. With 
`-fbuild-session-{file,timestamp}=` and  
`-fmodules-validate-once-per-build-session` this reduces the number of times a 
PCM file is checked per "build session".

The behavior I'm seeing with the current scheme is that when lots of Clang 
instances wait for the same PCM to be built, they race to validate it as soon 
as the file lock gets released, causing lots of concurrent IO.

This patch makes it so that the timestamp is written by the same Clang instance 
responsible for building the PCM while still holding the lock. This makes it so 
that whenever a PCM file gets compiled, it's never re-validated in the same 
build session.

I believe this is as sound as the current scheme. One thing to be aware of is 
that there might be a time interval between accessing input file N and writing 
the timestamp file, where changes to input files 0..<N would not result in a 
rebuild. Since this is the case current scheme too, I'm not too concerned about 
that.

I've seen this speed up `clang-scan-deps` by ~27%.

>From 4aacee4f10f3806a83730182146ffcc25b63eb2b Mon Sep 17 00:00:00 2001
From: Jan Svoboda <jan_svob...@apple.com>
Date: Tue, 15 Oct 2024 16:29:37 -0700
Subject: [PATCH] [clang][modules] Timestamp PCM files when writing

Clang uses timestamp files to track the last time an implicitly-built PCM file 
was verified to be up-to-date with regard to its inputs. With 
`-fbuild-session-{file,timestamp}=` and  
`-fmodules-validate-once-per-build-session` this reduces the number of times a 
PCM file is checked per "build session".

The behavior I'm seeing with the current scheme is that when lots of Clang 
instances wait for the same PCM to be built, they race to validate it as soon 
as the file lock gets released, causing lots of concurrent IO.

This patch makes it so that the timestamp is written by the same Clang instance 
responsible for building the PCM while still holding the lock. This makes it so 
that whenever a PCM file gets compiled, it's never re-validated in the same 
build session.

I believe this is as sound as the current scheme. One thing to be aware of is 
that there might be a time interval between accessing input file N and writing 
the timestamp file, where changes to input files 0..<N would not result in a 
rebuild. Since this is the case current scheme too, I'm not too concerned about 
that.

I've seen this speed up `clang-scan-deps` by ~27%.
---
 clang/include/clang/Serialization/ModuleFile.h |  5 +++--
 clang/lib/Serialization/ASTCommon.cpp          | 15 +++++++++++++++
 clang/lib/Serialization/ASTCommon.h            |  3 +++
 clang/lib/Serialization/ASTReader.cpp          | 15 +--------------
 clang/lib/Serialization/ASTWriter.cpp          |  4 ++++
 clang/lib/Serialization/ModuleManager.cpp      |  3 ++-
 6 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/clang/include/clang/Serialization/ModuleFile.h 
b/clang/include/clang/Serialization/ModuleFile.h
index 30e7f6b3e57bd8..29d3354d07a129 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -15,6 +15,7 @@
 #define LLVM_CLANG_SERIALIZATION_MODULEFILE_H
 
 #include "clang/Basic/FileManager.h"
+#include "clang/Basic/LLVM.h"
 #include "clang/Basic/Module.h"
 #include "clang/Basic/SourceLocation.h"
 #include "clang/Serialization/ASTBitCodes.h"
@@ -144,8 +145,8 @@ class ModuleFile {
   /// The base directory of the module.
   std::string BaseDirectory;
 
-  std::string getTimestampFilename() const {
-    return FileName + ".timestamp";
+  static std::string getTimestampFilename(StringRef FileName) {
+    return (FileName + ".timestamp").str();
   }
 
   /// The original source file name that was used to build the
diff --git a/clang/lib/Serialization/ASTCommon.cpp 
b/clang/lib/Serialization/ASTCommon.cpp
index ab4923de6346f8..ec18e84255ca8e 100644
--- a/clang/lib/Serialization/ASTCommon.cpp
+++ b/clang/lib/Serialization/ASTCommon.cpp
@@ -15,7 +15,10 @@
 #include "clang/AST/DeclObjC.h"
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Serialization/ASTDeserializationListener.h"
+#include "clang/Serialization/ModuleFile.h"
 #include "llvm/Support/DJB.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
 
 using namespace clang;
 
@@ -503,3 +506,15 @@ bool serialization::needsAnonymousDeclarationNumber(const 
NamedDecl *D) {
     return false;
   return isa<TagDecl, FieldDecl>(D);
 }
+
+void serialization::updateModuleTimestamp(StringRef ModuleFilename) {
+  // Overwrite the timestamp file contents so that file's mtime changes.
+  std::error_code EC;
+  llvm::raw_fd_ostream OS(ModuleFile::getTimestampFilename(ModuleFilename), EC,
+                          llvm::sys::fs::OF_TextWithCRLF);
+  if (EC)
+    return;
+  OS << "Timestamp file\n";
+  OS.close();
+  OS.clear_error(); // Avoid triggering a fatal error.
+}
diff --git a/clang/lib/Serialization/ASTCommon.h 
b/clang/lib/Serialization/ASTCommon.h
index 0230908d3e0528..2a765eafe08951 100644
--- a/clang/lib/Serialization/ASTCommon.h
+++ b/clang/lib/Serialization/ASTCommon.h
@@ -15,6 +15,7 @@
 
 #include "clang/AST/ASTContext.h"
 #include "clang/AST/DeclFriend.h"
+#include "clang/Basic/LLVM.h"
 #include "clang/Serialization/ASTBitCodes.h"
 
 namespace clang {
@@ -100,6 +101,8 @@ inline bool isPartOfPerModuleInitializer(const Decl *D) {
   return false;
 }
 
+void updateModuleTimestamp(StringRef ModuleFilename);
+
 } // namespace serialization
 
 } // namespace clang
diff --git a/clang/lib/Serialization/ASTReader.cpp 
b/clang/lib/Serialization/ASTReader.cpp
index 1b2473f2457344..f6e584197e2cf7 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4415,19 +4415,6 @@ bool ASTReader::isGlobalIndexUnavailable() const {
          !hasGlobalIndex() && TriedLoadingGlobalIndex;
 }
 
-static void updateModuleTimestamp(ModuleFile &MF) {
-  // Overwrite the timestamp file contents so that file's mtime changes.
-  std::string TimestampFilename = MF.getTimestampFilename();
-  std::error_code EC;
-  llvm::raw_fd_ostream OS(TimestampFilename, EC,
-                          llvm::sys::fs::OF_TextWithCRLF);
-  if (EC)
-    return;
-  OS << "Timestamp file\n";
-  OS.close();
-  OS.clear_error(); // Avoid triggering a fatal error.
-}
-
 /// Given a cursor at the start of an AST file, scan ahead and drop the
 /// cursor into the start of the given block ID, returning false on success and
 /// true on failure.
@@ -4706,7 +4693,7 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef 
FileName, ModuleKind Type,
       ImportedModule &M = Loaded[I];
       if (M.Mod->Kind == MK_ImplicitModule &&
           M.Mod->InputFilesValidationTimestamp < HSOpts.BuildSessionTimestamp)
-        updateModuleTimestamp(*M.Mod);
+        updateModuleTimestamp(M.Mod->FileName);
     }
   }
 
diff --git a/clang/lib/Serialization/ASTWriter.cpp 
b/clang/lib/Serialization/ASTWriter.cpp
index 938d7b525cb959..c0fb7a0591be02 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -4905,6 +4905,10 @@ ASTFileSignature ASTWriter::WriteAST(Sema &SemaRef, 
StringRef OutputFile,
   this->BaseDirectory.clear();
 
   WritingAST = false;
+
+  if (WritingModule)
+    updateModuleTimestamp(OutputFile);
+
   if (ShouldCacheASTInMemory) {
     // Construct MemoryBuffer and update buffer manager.
     ModuleCache.addBuiltPCM(OutputFile,
diff --git a/clang/lib/Serialization/ModuleManager.cpp 
b/clang/lib/Serialization/ModuleManager.cpp
index e74a16b6368028..ba78c9ef5af67f 100644
--- a/clang/lib/Serialization/ModuleManager.cpp
+++ b/clang/lib/Serialization/ModuleManager.cpp
@@ -170,7 +170,8 @@ ModuleManager::addModule(StringRef FileName, ModuleKind 
Type,
   NewModule->InputFilesValidationTimestamp = 0;
 
   if (NewModule->Kind == MK_ImplicitModule) {
-    std::string TimestampFilename = NewModule->getTimestampFilename();
+    std::string TimestampFilename =
+        ModuleFile::getTimestampFilename(NewModule->FileName);
     llvm::vfs::Status Status;
     // A cached stat value would be fine as well.
     if (!FileMgr.getNoncachedStatValue(TimestampFilename, Status))

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to