Re: RFC: A redesign of `-Mmodules` output

Ben Boeckel via Gcc Mon, 03 Mar 2025 06:40:09 -0800

Hi,

I'm Ben and implemented modules support in CMake, authored P1689 itself,
its support in GCC, and helped wrangle its support into clang and MSVC.

I encourage you to read this paper:

    https://mathstuf.fedorapeople.org/fortran-modules/fortran-modules.html

which describes the strategy CMake uses for compiling Fortran modules
(which are isomorphic to C++ modules at the build tool level). I believe
that this strategy (sometimes called "explicit module builds" elsewhere)
is the only long-term viable strategy for projects (one-off `g++`
commands may want the "implicit module build" strategy which is
more-or-less "dump things into a directory and find modules like we find
headers"). However, the number of corner cases that exist at this level
are only bad news for using it for projects in the real world (i.e.,
incrementally; clean CI builds probably don't actually care that much).

There is also this repository which contains "interesting" corner cases
for scanners:

    https://github.com/mathstuf/cxx-modules-sandbox

You may also be interested in this thread on the autoconf list:

    https://lists.gnu.org/archive/html/autoconf/2025-02/msg00000.html

On Fri, Feb 28, 2025 at 05:38:11 +0000, vspefs via Gcc wrote:
> Current `-Mmodules` output is based on [P1602R0](wg21.link/p1602r0), which
> speaks about a set of Makefile rules that can handle modules, with the help of
> module mappers and a modified GNU Make.
> 
> The proposal came out in 2019, and the output of those rules was implemented
> at GCC in 2020. However, so far we still don't have a new release of GNU Make
> which implements P1602R0.
> 
> What's more, the rules described in P1602R0 are not ideal. It sets up phony
> prerequisites for real-file targets, causing guaranteed rebuilds. It is also
> unable to handle dependencies among module interfaces - that is to say, if
> module A imports and exports B, then the interface of module A depends on that
> of module B, so its CMI should be rebuilt if the interface of B changes.

Note that this is the case even if A only imports B. Only GCC makes
"standalone" CMI files; MSVC and Clang both want B's CMI location to be
known when A's CMI is imported.

> I tried a few approaches to fix the current implementation, but in vain.
> 
> It is possible, however, to have another set of Makefile rules generated, 
> which
> solves all the problems, and doesn't need a new GNU Make. I've posted it in
> reddit.

There are other problems that a build tool (make, ninja) cannot resolve
on its own. These include (but may not be limited to):

- permission to import: just because B exists does not mean that A is
  allowed to import it:
  - it could be private to B's library and only available to other TUs
    in the same library
  - it could be from a library that links *to* A's library (causing
    circular linker artifact dependencies (though not necessarily build
    graph dependencies))
- incremental builds: loading previous state might allow state to
  "percolate" up the build graph
  - A imports B fails on the first run because dependencies are not
    correct, but B is now made due to `-k` or concurrent execution
    passing while A's failure shuts down the graph: how to preserve that
    "B is not visible by A" state?
  - B's CMI is on disk yet, but its source is deleted; how to ensure
    that nothing imports it even though it can be satisfied with on-disk
    state?
- circular dependencies: actually pretty easy for build tools to handle
  as it is a very expected error case to handle; not easy for dynamic
  mappers

> See 
> [here](https://www.reddit.com/r/cpp/comments/1izg2cc/make_me_a_module_now/).
> 
> To briefly summarize the idea:
> 
> > If an object target is built from a module interface unit, the rules 
> > generated
> > are:
> >
> > ```Makefile
> > target.o: source.cc regular_prereqs header_unit_prereqs| 
> > header_unit_prereqs \
> > module_prereqs
> > source_cmi.gcm: source.cc regular_prereqs header_unit_prereqs \
> > module_prereqs| target.o
> > ```
> > If an object target is not, the rule generated is:
> >
> > ```Makefile
> > target.o: source_files regular_prereqs \
> > header_unit_prereqs| header_unit_prereqs module_prereqs
> > ```
> >
> > The `header_unit_prereqs` and `module_prereqs` are paths to the 
> > corresponding
> > CMI files.

I haven't sat down and drawn out the build graph this makes, but it
passes the smell test at least (though I'm just ignoring header unit
stuff at this point).

As for your questions on Reddit:

> The module mapper maps between module interface units, module names,
> and CMIs. It's good. But who should be responsible for using it? The
> build system, or the compiler?

I believe it is the build *system*'s job. I suppose I should clarify my
definitions here:

- build tool: build graph executor (e.g., ninja, make)
- build system: provides a model of libraries, executables, and other
  rules which may be rendered as a build graph to be executed by a build
  tool (e.g., cmake, meson, automake)

There are projects that are both at once (e.g., build2, boost.build,
tup). The key difference is that the build system has a "higher level"
graph which associates groups of compilations into artifacts (usually
visible in the build graph by only looking at the linker bits). This
implies "walls" between "just compiles" that might be topologically
possible on the build graph, but logically inconsistent with the target
graph. Say we have:

- library A
- library B
- executable E which links to A and B

A and B have no relation, so while a module import from an A compile
into a B compile is *possible*, the build *system* does not consider
this possible as B does not depend on A at all. Build tools (and
certainly compilers) lack this context, so I don't think a generic
mapper at that level is, in general, viable.

> If it's the build system, then should we take our time, implement it
> in a new version of GNU Make, release it, and cast some magic spells
> to let people switch to it overnight?

Make is only really missing `restat = 1` for performance (correctness is
fine as Make only runs things unnecessarily without it). Everything else
is *possible* even in POSIX make. There *might* need to be another
feature for one global graph, but even without that, a static 2-level
recursive Makefile setup is sufficient.

> Furthermore, should we implement one for every build system?

Every build system needs a "collator" that transports enough information
about its target-level semantics to the build graph to stitch together
scanning outputs into rules for the build tool.

I also see this assertion in your post:

> TL;DR - CMIs and object files are managed separately, and it
> ultimately achieves everything we (at least I) want from modules.
> Sometimes a CMI might be redundantly built. Once.

Note that one may need *multiple* CMIs for a given source in a single
build graph. This is because CMI compatibility is *very* narrow. If A is
compiled with C++26 and B with C++23 and both use modules from C, each
needs *unique* CMIs to be able to import them because the standard level
changes the parser enough that CMIs cannot be loaded. There are many
flags that can affect CMI compatibility (in fact, it is probably easier
to list those known to *not* affect it: `-v`, `-ftime-report*`, `-M*`,
`-fdeps-*`, `-pipe`, `-save-*`, `-time` and flags like them).

> Header units

Header units have all of these problems too, but they need figured out
right away rather than having useful "checkpoint" states of the
implementation like named modules have. It's why they'll be the last
thing CMake implements for modules (despite them being "transitional",
they are all the hardest parts of named modules, all at once, for build
systems).

Thanks,

--Ben

Re: RFC: A redesign of `-Mmodules` output

Reply via email to