jansvoboda11 marked 5 inline comments as done.
jansvoboda11 added a comment.

In D112915#3122340 <https://reviews.llvm.org/D112915#3122340>, @dexonsmith 
wrote:

> Implementation looks a lot cleaner!
>
> I'd still like to drop NumIncludes first if possible because I think it'll be 
> easier to reason about without this extra layer of complexity. Also, that'd 
> mitigate the potential regression in `.pcm` size.

Done in D114096 <https://reviews.llvm.org/D114096>. Thanks for the feedback.



================
Comment at: clang/include/clang/Lex/HeaderSearch.h:133-139
-
-  /// Determine whether this is a non-default header file info, e.g.,
-  /// it corresponds to an actual header we've included or tried to include.
-  bool isNonDefault() const {
-    return isImport || isPragmaOnce || NumIncludes || ControllingMacro ||
-      ControllingMacroID;
-  }
----------------
dexonsmith wrote:
> Looks like this is already dead code? If so, please separate out and commit 
> ahead of time (e.g., now).
Done in D114092.


================
Comment at: clang/include/clang/Serialization/ModuleFile.h:399
+  /// include information.
+  llvm::StringMap<llvm::SmallVector<uint64_t, 64>> SubmoduleIncludedFiles;
+
----------------
dexonsmith wrote:
> Each StringMapEntry is going to have a pretty big allocation here, for a 512B 
> vector. Given that it doesn't need to be after creation, it'd be more 
> efficient to use this pattern:
> ```
> lang=c++
> llvm::StringMap<ArrayRef<uint64_t>> SubmoduleIncludedFiles;
> SpecificBumpPtrAlloc<uint64_t> SubmoduleIncludedFilesAlloc;
> 
> // later
> MutableArrayRef<uint64_t> 
> Files(SubmoduleIncludedFiles.Allocate(Record.size()), Record.size());
> llvm::copy(Record, Files.begin());
> SubmoduleIncludedFiles[Key] = Files;
> ```
> 
> That said, I feel like this should just be:
> ```
> lang=c++
> DenseMap<StringRef, StringRef> SubmoduleIncludedFiles;
> ```
> The key can point at the name of the submodule, which must already exist 
> somewhere without needing a StringMap to create a new copy of it. The value 
> is a lazily deserialized blob.
I switched to `StringRef` value in the latest revision. Unfortunately, had to 
use `std::string` as the key instead of `StringRef`, since `getFullModuleName` 
constructs the string on heap. That forced me to use `std::map`, too. I'll 
explore using something else entirely as the key.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:1323
+        auto &ModNumIncludes = SubmoduleIncludeStates[M].NumIncludes;
+        for (unsigned UID = 0; UID <= ModNumIncludes.maxUID(); ++UID) {
+          CurSubmoduleIncludeState->NumIncludes[UID] += ModNumIncludes[UID];
----------------
dexonsmith wrote:
> jansvoboda11 wrote:
> > dexonsmith wrote:
> > > jansvoboda11 wrote:
> > > > Iterating over all FileEntries is probably not very efficient, as 
> > > > Volodymyr mentioned. Thinking about how to make this more efficient...
> > > My suggestion above to drop FileEntryMap in favour of a simple DenseMap 
> > > would help a bit, just iterating through the files actually included by 
> > > the submodules.
> > > 
> > > Further, I wonder if "num-includes"/file/submodule (`unsigned`) is 
> > > actually useful, vs. "was-included"/file/submodule (`bool`). The only 
> > > observer I see is `HeaderSearch::PrintStats()` but maybe I missed 
> > > something?
> > > 
> > > If I'm right and we can switch to `bool`, then NumIncludes becomes a 
> > > `DenseSet<FileEntry *> IncludedFiles` (or `DenseSet<unsigned>` for UIDs). 
> > > (BTW, I also wondered if you could rotate the map, using File as the 
> > > outer key, and then use bitsets for the sbumodules, but I doubt this is 
> > > better, since most files are only included by a few submodules, right?)
> > > 
> > > Then you can just do a set union here. Also simplifies bitcode 
> > > serialization.
> > > 
> > > (If a `bool`/set is sufficient, then I'd suggest landing that 
> > > first/separately, before adding the per-submodule granularity in this 
> > > patch.)
> > For each file, we need to have three distinct states: not included at all, 
> > included exactly once (`firstTimeLexingFile`), included more than once. 
> > This means we can't use a single `DenseSet`.
> > 
> > But we could use a `DenseMap<Key, bool>`, where "not included at all" can 
> > be expressed as being absent from the map, exactly once as having `true` in 
> > the map and more than once as having `false` in the map. Alternatively, we 
> > could use two `DenseSet` instances to encode the same, but I don't think 
> > having two lookups per file to determine stuff is efficient.
> > 
> > I can look into this in a follow-up patch.
> Seems like a DenseSet could still be used by having HeaderInfo pass back the 
> WasInserted bit from the insertion to the preprocessor, and threading it 
> through to Preprocessor::HandleEndOfFile (the only caller of 
> FirstTimeLexingFile):
> ```
> lang=c++
> bool IsFirst = Set.insert(Key).second;
> ```
> The threading doesn't seem too hard. Looking at main:
> - Preprocessor::HandleHeaderIncludeOrImport calls 
> HeaderInfo::ShouldEnterIncludeFile. This does the `++FI.NumIncludes` (going 
> from 0 to 1). Instead, it could be `IsFirst = !FI.WasIncluded; FI.WasIncluded 
> = true;`, then return `IsFirst` somehow. (Then your patch can pull `IsFirst` 
> from the `insert().second`).
> - Preprocessor::HandleHeaderIncludeOrImport calls 
> Preprocessor::EnterSourceFile. This creates a new Lexer for that file. 
> `IsFirst` can be stored on that Lexer.
> - Preprocessor::HandleEndOfFile calls FirstTimeLexingFile. Instead, it can 
> check the new accessor `CurLexer->isFirstTimeLexing()`.
> 
> > I can look into this in a follow-up patch.
> 
> Follow-up might be okay, but it'd be nice to remove an axis of complexity 
> before adding a new one if it's reasonable. E.g., it'll be easier to debug 
> emergent issues from changing it to a simple set since there's less machinery 
> to worry about.
Extracted into D114093.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112915/new/

https://reviews.llvm.org/D112915

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to