Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On 6/22/23 22:45, Ben Boeckel wrote: On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote: On 1/25/23 16:06, Ben Boeckel wrote: They affect the build, so report them via `-MF` mechanisms. Why isn't this covered by the existing code in preprocessed_module? It appears as though it is neutered in patch 3 where `write_make_modules_deps` is used in `make_write` (or will use that name Why do you want to record the transitive modules? I would expect just noting the ones with imports directly in the TU would suffice (i.e check the 'outermost' arg) nathan -- Nathan Sidwell
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On 7/18/23 16:52, Jason Merrill wrote: On 6/25/23 12:36, Ben Boeckel wrote: On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote: On 6/22/23 22:45, Ben Boeckel wrote: On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote: On 1/25/23 16:06, Ben Boeckel wrote: They affect the build, so report them via `-MF` mechanisms. Why isn't this covered by the existing code in preprocessed_module? It appears as though it is neutered in patch 3 where `write_make_modules_deps` is used in `make_write` (or will use that name Why do you want to record the transitive modules? I would expect just noting the ones with imports directly in the TU would suffice (i.e check the 'outermost' arg) FWIW, only GCC has "fat" modules. MSVC and Clang both require the transitive closure to be passed. The idea there is to minimize the size of individual module files. If GCC only reads the "fat" modules, then only those should be recorded. If it reads other modules, they should be recorded as well. Please explain what you mean by fat modules. There seems to be confusion. But wouldn't the transitive modules be dependencies of the direct imports, so (re)building the direct imports would first require building the transitive modules anyway? Expressing the transitive closure of dependencies for each importer seems redundant when it can be easily derived from the direct dependencies of each module. Jason -- Nathan Sidwell
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On 7/18/23 20:01, Ben Boeckel wrote: On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote: On 6/25/23 12:36, Ben Boeckel wrote: On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote: On 6/22/23 22:45, Ben Boeckel wrote: On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote: On 1/25/23 16:06, Ben Boeckel wrote: They affect the build, so report them via `-MF` mechanisms. Why isn't this covered by the existing code in preprocessed_module? It appears as though it is neutered in patch 3 where `write_make_modules_deps` is used in `make_write` (or will use that name Why do you want to record the transitive modules? I would expect just noting the ones with imports directly in the TU would suffice (i.e check the 'outermost' arg) FWIW, only GCC has "fat" modules. MSVC and Clang both require the transitive closure to be passed. The idea there is to minimize the size of individual module files. If GCC only reads the "fat" modules, then only those should be recorded. If it reads other modules, they should be recorded as well. For clarification, given: * a.cppm ``` export module a; ``` * b.cppm ``` export module b; import a; ``` * use.cppm ``` import b; ``` in a "fat" module setup, `use.cppm` only needs to be told about `b.cmi` because it contains everything that an importer needs to know about the `a` module (reachable types, re-exported bits, whateve > With the "thin" modules, `a.cmi` must be specified when compiling `use.cppm` to satisfy anything that may be required transitively (e.g., a return GCC is neither of these descriptions. a CMI does not contain the transitive closure of its imports. It contains an import table. That table lists the transitive closure of its imports (it needs that closure to do remapping), and that table contains the CMI pathnames of the direct imports. Those pathnames are absolute, if the mapper provded an absolute pathm or relative to the CMI repo. The rationale here is that if you're building a CMI, Foo, which imports a bunch of modules, those imported CMIs will have the same (relative) location in this compilation and in compilations importing Foo (why would you move them?) Note this is NOT inhibiting relocatable builds, because of the CMI repo. Maybe I'm missing how this *actually* works in GCC as I've really only interacted with it through the command line, but I've not needed to mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and read through some embedded information in `b.cmi` or does `b.cmi` include enough information to not need to read it at all? If the former, distributed builds are going to have a problem knowing what files to send just from the command line (I'll call this "implicit thin"). If the latter, that is the "fat" CMI that I'm thinking of. please don't use perjorative terms like 'fat' and 'thin'. But wouldn't the transitive modules be dependencies of the direct imports, so (re)building the direct imports would first require building the transitive modules anyway? Expressing the transitive closure of dependencies for each importer seems redundant when it can be easily derived from the direct dependencies of each module. I'm not concerned whether it is transitive or not, really. If a file is read, it should be reported here regardless of the reason. Note that caching mechanisms may skip actually *doing* the reading, but the dependencies should still be reported from the cached results as-if the real work had been performed. --Ben -- Nathan Sidwell
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On 7/19/23 20:47, Ben Boeckel wrote: On Wed, Jul 19, 2023 at 17:11:08 -0400, Nathan Sidwell wrote: GCC is neither of these descriptions. a CMI does not contain the transitive closure of its imports. It contains an import table. That table lists the transitive closure of its imports (it needs that closure to do remapping), and that table contains the CMI pathnames of the direct imports. Those pathnames are absolute, if the mapper provded an absolute pathm or relative to the CMI repo. The rationale here is that if you're building a CMI, Foo, which imports a bunch of modules, those imported CMIs will have the same (relative) location in this compilation and in compilations importing Foo (why would you move them?) Note this is NOT inhibiting relocatable builds, because of the CMI repo. But it is inhibiting distributed builds because the distributing tool would need to know: - what CMIs are actually imported (here, "read the module mapper file" (in CMake's case, this is only the modules that are needed; a single massive mapper file for an entire project would have extra entries) or "act as a proxy for the socket/program specified" for other approaches); This information is in the machine (& human) README section of the CMI. - read the CMIs as it sends to the remote side to gather any other CMIs that may be needed (recursively); Contrast this with the MSVC and Clang (17+) mechanism where the command line contains everything that is needed and a single bolus can be sent. um, the build system needs to create that command line? Where does the build system get that information? IIUC it'll need to read some file(s) to do that. And relocatable is probably fine. How does it interact with reproducible builds? Or are GCC CMIs not really something anyone should consider for installation (even as a "here, maybe this can help consumers" mechanism)? Module CMIs should be considered a cacheable artifact. They are neither object files nor source files. On 7/18/23 20:01, Ben Boeckel wrote: Maybe I'm missing how this *actually* works in GCC as I've really only interacted with it through the command line, but I've not needed to mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and read through some embedded information in `b.cmi` or does `b.cmi` include enough information to not need to read it at all? If the former, distributed builds are going to have a problem knowing what files to send just from the command line (I'll call this "implicit thin"). If the latter, that is the "fat" CMI that I'm thinking of. please don't use perjorative terms like 'fat' and 'thin'. Sorry, I was internally analogizing to "thinLTO". --Ben -- Nathan Sidwell
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On 7/21/23 10:57, Ben Boeckel wrote: On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote: On 7/19/23 20:47, Ben Boeckel wrote: But it is inhibiting distributed builds because the distributing tool would need to know: - what CMIs are actually imported (here, "read the module mapper file" (in CMake's case, this is only the modules that are needed; a single massive mapper file for an entire project would have extra entries) or "act as a proxy for the socket/program specified" for other approaches); This information is in the machine (& human) README section of the CMI. Ok. That leaves it up to distributing build tools to figure out at least. - read the CMIs as it sends to the remote side to gather any other CMIs that may be needed (recursively); Contrast this with the MSVC and Clang (17+) mechanism where the command line contains everything that is needed and a single bolus can be sent. um, the build system needs to create that command line? Where does the build system get that information? IIUC it'll need to read some file(s) to do that. It's chained through the P1689 information in the collator as needed. No extra files need to be read (at least with CMake's approach); certainly not CMI files. It occurs to me that the model I am envisioning is similar to CMake's object libraries. Object libraries are a convenient name for a bunch of object files. IIUC they're linked by naming the individual object files (or I think the could be implemented as a static lib linked with --whole-archive path/to/libfoo.a -no-whole-archive. But for this conversation consider them a bunch of separate object files with a convenient group name. Consider also that object libraries could themselves contain object libraries (I don't know of they can, but it seems like a useful concept). Then one could create an object library from a collection of object files and object libraries (recursively). CMake would handle the transitive gtaph. Now, allow an object library to itself have some kind of tangible, on-disk representation. *BUT* not like a static library -- it doesn't include the object files. Now that immediately maps onto modules. CMI: Object library Direct imports: Direct object libraries of an object library This is why I don't understand the need explicitly indicate the indirect imports of a CMI. CMake knows them, because it knows the graph. And relocatable is probably fine. How does it interact with reproducible builds? Or are GCC CMIs not really something anyone should consider for installation (even as a "here, maybe this can help consumers" mechanism)? Module CMIs should be considered a cacheable artifact. They are neither object files nor source files. Sure, cachable sounds fine. What about the installation? --Ben -- Nathan Sidwell