> On 3 Sep 2024, at 13:59, Nathaniel Shead <nathanielosh...@gmail.com> wrote:
>
> On Tue, Sep 03, 2024 at 10:14:29AM +0100, Iain Sandoe wrote:
>> Hi Folks,
>>
>> When we build a C++ binary module (CMI/BMI), we obviously have access to its
>> source to produce diagnostics, all fine.
>>
>> However, when we consume that module we might also need access to the
>> sources used to build it - since diagnostics triggered in the consumer can
>> refer back to the sources used.
>>
>> Currently clang has been experimenting with embedding the sources into the
>> BMI - this can make things seem more efficient when, for example,
>> distributing BMIs to remote nodes in a large-scale distributed build.
>>
>> There was a patch proposed to make this the default for clang, which has
>> resulted in the discussion here:
>>
>> https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029
>>
>> Does GCC have a plan to deal with this?
>> .. if so can we communicate it..
>> .. if not, then what do we think about this strategy?
>> .. at the very least trying to avoid tooling divergence seems a worthwhile
>> objective.
>>
>> thanks
>> Iain
>>
>
> I've thought a little bit about this in the past but I don't currently
> think embedding source files by default would be a great idea.
>
> For GCC, the main benefit of embedding the original source would be that
> currently, if you compile a module interface, then edit the original
> source file before compiling a TU that consumes that source file,
> diagnostics referring to the original module interface may appear
> "incorrect" and quote unrelated code or point to incorrect lines, which
> can potentially be confusing:
>
> x.cpp: In function ‘int main()’:
> x.cpp:4:7: error: invalid conversion from ‘const char*’ to ‘int’
> [-fpermissive]
> 4 | foo("hello");
> | ^~~~~~~
> | |
> | const char*
> In module M, imported at x.cpp:1:
> m.cpp:2:17: note: initialising argument 1 of ‘void foo@M(int)’
> 2 | hello this file is edited
> | ^~~
I would expect, in most build systems, that any change in the BMI sources
would cause a rebuild of the BMI itself - anything else is not actually safe?
> This is the same issue you might get with listing code using GDB on a
> binary that is older than the source files it was built with; GDB
> currently provides a nice warning when this happens that the source file
> may be out of date, that I imagine could be helpful in this circumstance
> as well. But this would not require embedding any source files, just
> comparing the mtime of the CMI vs. the source and/or header file in
> question.
>
> Either way I don't think this is much of an issue in practice: I would
> expect users to typically be using build systems that will ensure this
> above situation never occurs to begin with, and even in cases where the
> source files are completely unavailable diagnostics still work fine as
> it is, just without quoting of the source code (though line:column
> information is still available).
>
> And on the flip side, embedding source files could grow the CMIs quite a
> lot; naively just embedding all source files for something like
>
> module;
> #include <string>
> export module M;
> export std::string foo();
>
> would presumably also require embedding the entire contents of <string>
> and all headers that it depends on, recursively. And this is quite
> apart from the potential questions around IP mentioned in the thread you
> linked.
There is some (anecdotal) evidence that for current AST representation on
clang, at least, that the BMI size does not grow much. See discourse discussion
for cross references.
cheers
Iain
>
> Yours,
> Nathaniel