On 9/3/24 11:53 AM, Iain Sandoe wrote:
On 3 Sep 2024, at 16:08, Jason Merrill via Gcc <gcc@gcc.gnu.org> wrote:
On 9/3/24 7:30 AM, Jonathan Wakely wrote:
On Tue, 3 Sept 2024, 10:15 Iain Sandoe, <i...@sandoe.co.uk
<mailto:i...@sandoe.co.uk>> wrote:
Hi Folks,
When we build a C++ binary module (CMI/BMI), we obviously have
access to its source to produce diagnostics, all fine.
However, when we consume that module we might also need access to
the sources used to build it - since diagnostics triggered in the
consumer can refer back to the sources used.
I'm fairly convinced by your argument that building the module usually happens
as part of the same build as consuming the module, and so the sources will be
available anyway.
For large scale build environments where pre-built BMIs might be deployed by
one team and consumed by other teams, without (re)building those BMIs, it
doesn't seem too difficult for the module interface sources to also be
deployed. That's not so different from deploying headers and libraries (.so,
.dlsym, .dll etc) today.
So I don't actually see a need to embed sources. It seems like it's solving
something that can easily be solved using existing processes. Just include
sources with BMIs that you deploy. If the full sources are sensitive IP,
separate your code into the public parts that are used to compile the BMI and
the non-public parts. Or proprietary vendors who don't want to do that
separation can choose to not provide code, and diagnostics suffer for their
users. That's not a technical problem, and doesn't need to be solved by the
compiler.
Agreed; it seems natural to provide interface unit sources everywhere you would
provide headers currently. Or not in cases where you wouldn't, such as distcc
compiling preprocessed code.
Currently clang has been experimenting with embedding the sources
into the BMI - this can make things seem more efficient when, for
example, distributing BMIs to remote nodes in a large-scale
distributed build.
There was a patch proposed to make this the default for clang, which
has resulted in the discussion here:
https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029
<https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029>
From the first post:
(1) Fix the underlying issue. Readers may already recognize that the two topics
(whether or not embedding source files) (security concerns) are not technically
mutually exclusive. The fundamental technical problem may be that clang require
to open the actual file during the compilation. It looks like both GCC and MSVC
doesn’t have the problem.
Sounds like the primary motivation for this clang change doesn't apply to GCC.
I think that might be a misunderstanding on the part of the author; AFAIU both
GCC and MSVC _do_ require access to the sources at BMI consume-time to give
decent diagnostics. I think that there might be confusion because the
compilation would suceed on those toolchains without the sources - but with
poorer diagnostic quality?
Exactly, compilation failing seems like the primary motivation for
clang, and that isn't an issue for gcc.
It doesn't seem worthwhile to do a bunch of work to embed sources just
to make it more possible for the compiler to print the relevant source
line in some niche remote execution scenario.
Hopefully, other folks from the “modules implementer’s group” including MSVC
will add comments to the discourse thread - we just discussed this (with my
impression being that most folks think it’s the build system’s territory).