Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-23 Thread Nathan Sidwell via Fortran

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the 
ones with imports directly in the TU would suffice (i.e check the 'outermost' arg)


nathan


--
Nathan Sidwell



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Nathan Sidwell via Fortran

On 7/18/23 16:52, Jason Merrill wrote:

On 6/25/23 12:36, Ben Boeckel wrote:

On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the
ones with imports directly in the TU would suffice (i.e check the 'outermost' 
arg)


FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.


Please explain what you mean by fat modules.  There seems to be confusion.



But wouldn't the transitive modules be dependencies of the direct imports, so 
(re)building the direct imports would first require building the transitive 
modules anyway?  Expressing the transitive closure of dependencies for each 
importer seems redundant when it can be easily derived from the direct 
dependencies of each module.


Jason



--
Nathan Sidwell



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-19 Thread Nathan Sidwell via Fortran

On 7/18/23 20:01, Ben Boeckel wrote:

On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:

On 6/25/23 12:36, Ben Boeckel wrote:

On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the
ones with imports directly in the TU would suffice (i.e check the 'outermost' 
arg)


FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.


For clarification, given:

* a.cppm
```
export module a;
```

* b.cppm
```
export module b;
import a;
```

* use.cppm
```
import b;
```

in a "fat" module setup, `use.cppm` only needs to be told about
`b.cmi` because it contains everything that an importer needs to know
about the `a` module (reachable types, re-exported bits, whateve > With
the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
to satisfy anything that may be required transitively (e.g., a return


GCC is neither of these descriptions.  a CMI does not contain the transitive 
closure of its imports.  It contains an import table.  That table lists the 
transitive closure of its imports (it needs that closure to do remapping), and 
that table contains the CMI pathnames of the direct imports.  Those pathnames 
are absolute, if the mapper provded an absolute pathm or relative to the CMI repo.


The rationale here is that if you're building a CMI, Foo, which imports a bunch 
of modules, those imported CMIs will have the same (relative) location in this 
compilation and in compilations importing Foo (why would you move them?) Note 
this is NOT inhibiting relocatable builds, because of the CMI repo.




Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.


please don't use perjorative terms like 'fat' and 'thin'.




But wouldn't the transitive modules be dependencies of the direct
imports, so (re)building the direct imports would first require building
the transitive modules anyway?  Expressing the transitive closure of
dependencies for each importer seems redundant when it can be easily
derived from the direct dependencies of each module.


I'm not concerned whether it is transitive or not, really. If a file is
read, it should be reported here regardless of the reason. Note that
caching mechanisms may skip actually *doing* the reading, but the
dependencies should still be reported from the cached results as-if the
real work had been performed.

--Ben


--
Nathan Sidwell



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-20 Thread Nathan Sidwell via Fortran

On 7/19/23 20:47, Ben Boeckel wrote:

On Wed, Jul 19, 2023 at 17:11:08 -0400, Nathan Sidwell wrote:

GCC is neither of these descriptions.  a CMI does not contain the transitive
closure of its imports.  It contains an import table.  That table lists the
transitive closure of its imports (it needs that closure to do remapping), and
that table contains the CMI pathnames of the direct imports.  Those pathnames
are absolute, if the mapper provded an absolute pathm or relative to the CMI 
repo.

The rationale here is that if you're building a CMI, Foo, which imports a bunch
of modules, those imported CMIs will have the same (relative) location in this
compilation and in compilations importing Foo (why would you move them?) Note
this is NOT inhibiting relocatable builds, because of the CMI repo.


But it is inhibiting distributed builds because the distributing tool
would need to know:

- what CMIs are actually imported (here, "read the module mapper file"
   (in CMake's case, this is only the modules that are needed; a single
   massive mapper file for an entire project would have extra entries) or
   "act as a proxy for the socket/program specified" for other
   approaches);


This information is in the machine (& human) README section of the CMI.


- read the CMIs as it sends to the remote side to gather any other CMIs
   that may be needed (recursively);

Contrast this with the MSVC and Clang (17+) mechanism where the command
line contains everything that is needed and a single bolus can be sent.


um, the build system needs to create that command line? Where does the build 
system get that information?  IIUC it'll need to read some file(s) to do that.




And relocatable is probably fine. How does it interact with reproducible
builds? Or are GCC CMIs not really something anyone should consider for
installation (even as a "here, maybe this can help consumers"
mechanism)?


Module CMIs should be considered a cacheable artifact.  They are neither object 
files nor source files.





On 7/18/23 20:01, Ben Boeckel wrote:

Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.


please don't use perjorative terms like 'fat' and 'thin'.


Sorry, I was internally analogizing to "thinLTO".

--Ben


--
Nathan Sidwell



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-21 Thread Nathan Sidwell via Fortran

On 7/21/23 10:57, Ben Boeckel wrote:

On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:

On 7/19/23 20:47, Ben Boeckel wrote:

But it is inhibiting distributed builds because the distributing tool
would need to know:

- what CMIs are actually imported (here, "read the module mapper file"
(in CMake's case, this is only the modules that are needed; a single
massive mapper file for an entire project would have extra entries) or
"act as a proxy for the socket/program specified" for other
approaches);


This information is in the machine (& human) README section of the CMI.


Ok. That leaves it up to distributing build tools to figure out at
least.


- read the CMIs as it sends to the remote side to gather any other CMIs
that may be needed (recursively);

Contrast this with the MSVC and Clang (17+) mechanism where the command
line contains everything that is needed and a single bolus can be sent.


um, the build system needs to create that command line? Where does the build
system get that information?  IIUC it'll need to read some file(s) to do that.


It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.


It occurs to me that the model I am envisioning is similar to CMake's object 
libraries.  Object libraries are a convenient name for a bunch of object files. 
IIUC they're linked by naming the individual object files (or I think the could 
be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
-no-whole-archive.  But for this conversation consider them a bunch of separate 
object files with a convenient group name.


Consider also that object libraries could themselves contain object libraries (I 
don't know of they can, but it seems like a useful concept).  Then one could 
create an object library from a collection of object files and object libraries 
(recursively).  CMake would handle the transitive gtaph.


Now, allow an object library to itself have some kind of tangible, on-disk 
representation.  *BUT* not like a static library -- it doesn't include the 
object files.



Now that immediately maps onto modules.

CMI: Object library
Direct imports: Direct object libraries of an object library

This is why I don't understand the need explicitly indicate the indirect imports 
of a CMI.  CMake knows them, because it knows the graph.





And relocatable is probably fine. How does it interact with reproducible
builds? Or are GCC CMIs not really something anyone should consider for
installation (even as a "here, maybe this can help consumers"
mechanism)?


Module CMIs should be considered a cacheable artifact.  They are neither object
files nor source files.


Sure, cachable sounds fine. What about the installation?

--Ben


--
Nathan Sidwell