On 9/3/24 11:53 AM, Iain Sandoe wrote:


On 3 Sep 2024, at 16:08, Jason Merrill via Gcc <gcc@gcc.gnu.org> wrote:

On 9/3/24 7:30 AM, Jonathan Wakely wrote:
On Tue, 3 Sept 2024, 10:15 Iain Sandoe, <i...@sandoe.co.uk 
<mailto:i...@sandoe.co.uk>> wrote:
    Hi Folks,
    When we build a C++ binary module (CMI/BMI), we obviously have
    access to its source to produce diagnostics, all fine.
    However, when we consume that module we might also need access to
    the sources used to build it - since diagnostics triggered in the
    consumer can refer back to the sources used.
I'm fairly convinced by your argument that building the module usually happens 
as part of the same build as consuming the module, and so the sources will be 
available anyway.
For large scale build environments where pre-built BMIs might be deployed by 
one team and consumed by other teams, without (re)building those BMIs, it 
doesn't seem too difficult for the module interface sources to also be 
deployed. That's not so different from deploying headers and libraries (.so, 
.dlsym, .dll etc) today.
So I don't actually see a need to embed sources. It seems like it's solving 
something that can easily be solved using existing processes. Just include 
sources with BMIs that you deploy. If the full sources are sensitive IP, 
separate your code into the public parts that are used to compile the BMI and 
the non-public parts. Or proprietary vendors who don't want to do that 
separation can choose to not provide code, and diagnostics suffer for their 
users. That's not a technical problem, and doesn't need to be solved by the 
compiler.

Agreed; it seems natural to provide interface unit sources everywhere you would 
provide headers currently.  Or not in cases where you wouldn't, such as distcc 
compiling preprocessed code.

    Currently clang has been experimenting with embedding the sources
    into the BMI - this can make things seem more efficient when, for
    example, distributing BMIs to remote nodes in a large-scale
    distributed build.

    There was a patch proposed to make this the default for clang, which
    has resulted in the discussion here:
    https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029 
<https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029>

 From the first post:

(1) Fix the underlying issue. Readers may already recognize that the two topics 
(whether or not embedding source files) (security concerns) are not technically 
mutually exclusive. The fundamental technical problem may be that clang require 
to open the actual file during the compilation. It looks like both GCC and MSVC 
doesn’t have the problem.

Sounds like the primary motivation for this clang change doesn't apply to GCC.

I think that might be a misunderstanding on the part of the author; AFAIU both 
GCC and MSVC _do_ require access to the sources at BMI consume-time to give 
decent diagnostics.   I think that there might be confusion because the 
compilation would suceed on those toolchains without the sources - but with 
poorer diagnostic quality?

Exactly, compilation failing seems like the primary motivation for clang, and that isn't an issue for gcc.

It doesn't seem worthwhile to do a bunch of work to embed sources just to make it more possible for the compiler to print the relevant source line in some niche remote execution scenario.

Hopefully, other folks from the “modules implementer’s group” including MSVC 
will add comments to the discourse thread - we just discussed this (with my 
impression being that most folks think it’s the build system’s territory).

Reply via email to