[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

Chuanqi Xu via Phabricator via cfe-commits Fri, 16 Jun 2023 22:50:43 -0700

ChuanqiXu added a comment.

> That said, there are two large issues that I think should be addressed in the 
> design (though not necessarily *implemented* now).


Yeah, totally agreed. Design is pretty important especially in open source 
softwares. I'm pretty open to the suggestions.

---

> support for clang header modules
>
> Google has a large deployment of clangd, serving a codebase that builds 
> significant parts as clang header modules for performance reasons. There is 
> no near-term plan of adopting C++20 modules (multiple reasons, which I 
> probably can't represent well).
> We've been disabling header modules for clangd & other tools for a long time. 
> But it's come to a head, and we expect to get someone working on enabling 
> modules in clangd in ~2 months.
> As I understand it, Meta are in a similar situation (though their solution is 
> to version-lock clangd with the toolchain, keep modules enabled, and accept 
> that some things are broken).
>
> I suspect the answer for header modules is that we can study this patch and 
> understand what the equivalents of graph nodes/deps/names/scanning look like 
> for explicit header modules, and understand that we'll be able to abstractify 
> some names and add some levels of indirection and it'll all work out.

Yeah. On the one hand, I always think that named modules are far different from 
header modules (including clang header modules and header units) for tools. 
Since for tools, it makes sense to fallback to traditional inclusion when they 
saw header units or header modules. For example, 
for clang header modules, IIUC, clangd can work if we rewrite the compile 
commands to strip a lot of clang header modules related options. And we reuse 
the rewrite process in the patch. (I know there are some exceptions due to 
macros). On the other hand, I'm curious about how do you handle/model clang 
explicit header modules with compilation database? I didn't look about this. Is 
the headers an entry in the compilation database? And if is it possible for one 
header to have multiple BMIs? Such problems look not easy to handle so I prefer 
to fallback it to inclusions.

> Our specific build system setup is that all module-build actions and inputs 
> explicit (-fmodule-file=, -fmodule-map-file=). The build system will not 
> produce usable PCMs due to version skew.
> I know that Meta folks *would* like to make use of available PCMs from the 
> build system.

Also it looks like to me the Meta's method are not  a solution we want. This 
looks basically what I want for named modules half a year ago. Basically I did 
nothing and just tried to let clangd fallback to clang to handle everything.

---

> support for large projects
>
> Apart from the concrete scanning of deps, keeping the full project module 
> graph in memory won't always be possible. (It's a perfectly reasonable 
> default implementation though).
>
>> We'll need some abstraction layer (like CompilationDatabase is to 
>> compile_commands.json) that exposes enough data to run the algorithms we 
>> need, without exposing so much that you have to hold the whole graph in 
>> memory. It could be backed by in-memory depscan results, or a build-system 
>> artifact, or a live query of the build system.

Yeah, I agree the performance is a key point. In the current design, the 
ModulesDependencyGraph is only visible to ModulesManager so I feel we left the 
space for further optimization while I don't have concrete ideas. The current 
patch only left 5 interfaces for outer users  (currently only TUScheduler) (the 
interfaces: UpdateNode(PathRef), isReadyToCompile(PathRef), 
HasInvalidDependencies(PathRef), addCallbackAfterReady(PathRef, 
std::function<void()>) and GenerateModuleInterfacesFor(PathRef)), which didn't 
leak any special data to outer users. So I feel while it is OK to adopt ideas 
to enhance the scaling ability, it is OK to forward now since we've left the 
space for further optimization. (premature optimization is the root of all evil 
:) )

> For "every time we update a file, all the affected files (e.g., we changed a 
> header file) will be re-scanned" - we need a way to express this more 
> abstractly than "re-scanned", and more narrowly than "all the affected files" 
> - at least it needs to be limited to all the files that could themselves 
> affect any open file.

Yeah, actually now this patch doesn't make it **explicitly**. Now we achieve 
this **implicitly** by making `TUScheduler::update` (which will be called every 
time the TU changes) to call `ModulesManager::UpdateNode(PathRef)`. What I want 
to do is to let `ModulesManager::UpdateNode(PathRef Node)` to call 
`TUScheduler::update` for the users of Node.

So here "re-scanned" means to call `TUScheduler::update`. And "all the affected 
files" is not reflected in the patch actually. Do you have suggestions to 
improve this or do you feel it is OK to make it implicitly as now?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

Reply via email to