> On 3 Sep 2024, at 13:59, Nathaniel Shead <nathanielosh...@gmail.com> wrote:
> 
> On Tue, Sep 03, 2024 at 10:14:29AM +0100, Iain Sandoe wrote:
>> Hi Folks,
>> 
>> When we build a C++ binary module (CMI/BMI), we obviously have access to its 
>> source to produce diagnostics, all fine.
>> 
>> However, when we consume that module we might also need access to the 
>> sources used to build it - since diagnostics triggered in the consumer can 
>> refer back to the sources used.
>> 
>> Currently clang has been experimenting with embedding the sources into the 
>> BMI - this can make things seem more efficient when, for example, 
>> distributing BMIs to remote nodes in a large-scale distributed build.
>> 
>> There was a patch proposed to make this the default for clang, which has 
>> resulted in the discussion here:
>> 
>> https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029
>> 
>> Does GCC have a plan to deal with this?
>> .. if so can we communicate it..
>> .. if not, then what do we think about this strategy?
>> .. at the very least trying to avoid tooling divergence seems a worthwhile 
>> objective.
>> 
>> thanks
>> Iain
>> 
> 
> I've thought a little bit about this in the past but I don't currently
> think embedding source files by default would be a great idea.
> 
> For GCC, the main benefit of embedding the original source would be that
> currently, if you compile a module interface, then edit the original
> source file before compiling a TU that consumes that source file,
> diagnostics referring to the original module interface may appear
> "incorrect" and quote unrelated code or point to incorrect lines, which
> can potentially be confusing:
> 
> x.cpp: In function ‘int main()’:
> x.cpp:4:7: error: invalid conversion from ‘const char*’ to ‘int’ 
> [-fpermissive]
>    4 |   foo("hello");
>      |       ^~~~~~~
>      |       |
>      |       const char*
> In module M, imported at x.cpp:1:
> m.cpp:2:17: note:   initialising argument 1 of ‘void foo@M(int)’
>    2 | hello this file is edited
>      |                 ^~~

I would expect, in most build systems, that any change in the BMI sources
would cause a rebuild of the BMI itself - anything else is not actually safe?

> This is the same issue you might get with listing code using GDB on a
> binary that is older than the source files it was built with; GDB
> currently provides a nice warning when this happens that the source file
> may be out of date, that I imagine could be helpful in this circumstance
> as well.  But this would not require embedding any source files, just
> comparing the mtime of the CMI vs. the source and/or header file in
> question.
> 
> Either way I don't think this is much of an issue in practice: I would
> expect users to typically be using build systems that will ensure this
> above situation never occurs to begin with, and even in cases where the
> source files are completely unavailable diagnostics still work fine as
> it is, just without quoting of the source code (though line:column
> information is still available).
> 
> And on the flip side, embedding source files could grow the CMIs quite a
> lot; naively just embedding all source files for something like
> 
>  module;
>  #include <string>
>  export module M;
>  export std::string foo();
> 
> would presumably also require embedding the entire contents of <string>
> and all headers that it depends on, recursively.  And this is quite
> apart from the potential questions around IP mentioned in the thread you
> linked.

There is some (anecdotal) evidence that for current AST representation on
clang, at least, that the BMI size does not grow much.  See discourse discussion
for cross references.

cheers
Iain

> 
> Yours,
> Nathaniel

Reply via email to