On Tue, Sep 03, 2024 at 10:14:29AM +0100, Iain Sandoe wrote: > Hi Folks, > > When we build a C++ binary module (CMI/BMI), we obviously have access to its > source to produce diagnostics, all fine. > > However, when we consume that module we might also need access to the sources > used to build it - since diagnostics triggered in the consumer can refer back > to the sources used. > > Currently clang has been experimenting with embedding the sources into the > BMI - this can make things seem more efficient when, for example, > distributing BMIs to remote nodes in a large-scale distributed build. > > There was a patch proposed to make this the default for clang, which has > resulted in the discussion here: > > https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029 > > Does GCC have a plan to deal with this? > .. if so can we communicate it.. > .. if not, then what do we think about this strategy? > .. at the very least trying to avoid tooling divergence seems a worthwhile > objective. > > thanks > Iain >
I've thought a little bit about this in the past but I don't currently think embedding source files by default would be a great idea. For GCC, the main benefit of embedding the original source would be that currently, if you compile a module interface, then edit the original source file before compiling a TU that consumes that source file, diagnostics referring to the original module interface may appear "incorrect" and quote unrelated code or point to incorrect lines, which can potentially be confusing: x.cpp: In function ‘int main()’: x.cpp:4:7: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive] 4 | foo("hello"); | ^~~~~~~ | | | const char* In module M, imported at x.cpp:1: m.cpp:2:17: note: initialising argument 1 of ‘void foo@M(int)’ 2 | hello this file is edited | ^~~ This is the same issue you might get with listing code using GDB on a binary that is older than the source files it was built with; GDB currently provides a nice warning when this happens that the source file may be out of date, that I imagine could be helpful in this circumstance as well. But this would not require embedding any source files, just comparing the mtime of the CMI vs. the source and/or header file in question. Either way I don't think this is much of an issue in practice: I would expect users to typically be using build systems that will ensure this above situation never occurs to begin with, and even in cases where the source files are completely unavailable diagnostics still work fine as it is, just without quoting of the source code (though line:column information is still available). And on the flip side, embedding source files could grow the CMIs quite a lot; naively just embedding all source files for something like module; #include <string> export module M; export std::string foo(); would presumably also require embedding the entire contents of <string> and all headers that it depends on, recursively. And this is quite apart from the potential questions around IP mentioned in the thread you linked. Yours, Nathaniel