On Thu, Oct 5, 2023 at 1:59 PM Sergei Trofimovich <sly...@gmail.com> wrote: > > On Thu, Oct 05, 2023 at 09:19:15AM +0200, Richard Biener wrote: > > On Wed, Oct 4, 2023 at 11:20 PM Sergei Trofimovich via Gcc > > <gcc@gcc.gnu.org> wrote: > > > > > > Hi gcc developers! > > > > > > Tl;DR: > > > > > > I would like to implement a scalable way to pass `-fmacro-prefix-map=` > > > for `NixOS` distribution to avoid leaking build-time paths generated by > > > `__FILE__` macros used by various libraries. > > > > > > I need some guidance what path to take to be acceptable for `gcc` > > > upstream. > > > > > > I have a few possible solutions and wonder what I should try to upstream > > > to GCC. The options I see: > > > > > > 1. Hardcode NixOS-specific way to mangle paths. > > > > > > Pros: simplest to implement, can be easily configured away if needed > > > Cons: inflexible, `clang` might or might not accept the same hack > > > > > > 2. Extend `-fmacro-prefix-map=` (or add a new `-fmacro-prefix-map-file=`) > > > to allow passing a file > > > > > > Pros: still not too hard to implement, generic enough to be used in > > > other contexts. > > > Cons: Will require client to construct the map file. > > > > > > 3. Have more flexible `-fmacro-prefix-map-regex=` option that allows > > > patterns. Something like: > > > > > > > > > -fmacro-prefix-map-regex=/nix/store/[a-z0-9]{32}-=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee- > > > > > > Pros: at least for NixOS one option will be enough to cover all > > > packages as they all share above template. > > > Cons: pulls some form of regex with it's can of worms including escape > > > delimiters, might not be flexible enough for other use cases. > > > > > > 4. Something else? > > > > > > Which one(s) should I take to implement? > > > > > > More words: > > > > > > `NixOS` (and `nixpkgs` repository) install every software package into > > > an individual directory with unique prefix. Some examples: > > > > > > /nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev > > > /nix/store/rb3q4kcyfg77cmkiwywx2aqdd3x5ch93-libmpc-1.3.1 > > > /nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2 > > > ... > > > > > > It's a fundamental design decision to allow parallel package installs. > > > > > > From dependency tracking standpoint it's highly undesirable to have > > > these absolute paths to be hardcoded into final executable binaries if > > > they are not used at runtime. > > > > > > Example redundant path we would like not to have in final binaries: > > > > > > $ strings result/bin/nix | grep phjcmy025rd1ankw5y1b21xsdii83cyk > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/json.hpp > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/output/serializer.hpp > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/conversions/to_chars.hpp > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/lexer.hpp > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iter_impl.hpp > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/json_sax.hpp > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iteration_proxy.hpp > > > > > > /nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2/include/nlohmann/detail/input/parser.hpp > > > > > > Those paths are inserted via glibc's assert() uses of `__FILE__` > > > directive and thus hardcode header file paths from various packages > > > (like lttng-ust or nlohmann/json) into compiled binaries. Sometimes > > > `__FILE__` usage is mire creating than assert(). > > > > > > I would like to get rid of references to header files. I think > > > `-fmacro-prefix-map=` are ideal for this particular use case. > > > > > > The prototype that creates equivalent of the following commands does > > > work for smaller packages: > > > > > > > > > -fmacro-prefix-map=/nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.37-8-dev > > > > > > -fmacro-prefix-map=/nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gmp-with-cxx-6.3.0-dev > > > > > > -fmacro-prefix-map=/nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-nlohmann_json-3.11.2 > > > ... > > > > > > The above works for small amount of options (like, 100). But around 1000 > > > options we start hitting linux limits on the single environment variable > > > or real-world packages like `qemu` with a ton of input depends. > > > > > > The command-line limitations are in various places: > > > - `gcc` limitation of lifting all command line options into a single > > > environment variable: https://gcc.gnu.org/PR111527 > > > - `linux` limitation of constraining single environ variable to a value > > > way below than full available environment space: > > > https://lkml.org/lkml/2023/9/24/381 > > > > > > `linux` fix would buy us 50x more budged (A Lot) but it will not help > > > much other operating systems like `Darwin` where absolute environment > > > limit is a lot lower than `linux`. > > > > > > I already implemented [1.] in https://github.com/NixOS/nixpkgs/pull/255192 > > > (also attached `mangle-NIX_STORE-in-__FILE__.patch` 3.5K patch against > > > `master` as a proof of concept). > > > > > > What would be the best way to scale up `-fmacro-prefix-map=` up to NixOS > > > needs for `gcc`? I would like to implement something sensible I could > > > upstream. > > > > > > What do you think? > > > > Go for (2) which I think is the only way to truly solve the command-line > > limitation issue (with less regular paths even regex wouldn't cut it). > > Sounds good. Do you have any preference over specific syntax? My > suggestions would be: > > 1. `-fmacro-prefix-map=file-name`: if `file-name` there is not in `key=val` > format then treat it as file > 2. `-fmacro-prefix-map=@file-name`: use @ as a signal to use file > 3. `fmacro-prefix-map-file=file-name`: use a new option
I'd prefer (2) > > Btw, I thought we have response files to deal with command-line limits, > > why doesn't that work here? I see the driver expands response files > > but IIRC it also builds those when the command-line gets too large > > and uses it for the environment and the cc1 invocation? If it doesn't > > do the latter why not fix it that way? > > Yeah, in theory response files would extend the limit. In practice `gcc` > always extends response files internally into a single > `COLLECT_GCC_OPTIONS` option and hits the environment variable limit > very early: > > https://gcc.gnu.org/PR111527 > > Example reproducer: > > $ for i in `seq 1 1000`; do printf -- "-fmacro-prefix-map=%0*d=%0*d\n" > 200 1 200 2; done > a.rsp > $ touch a.c; gcc @a.rsp -c a.c > gcc: fatal error: cannot execute 'cc1': execv: Argument list too long > compilation terminated. > > And if you want to look at the gory details: > > $ strace -f -etrace=execve -s 1000000 -v -v -v gcc @a.rsp -c a.c > ... > [pid 78] execve("cc1", ["cc1", "-quiet", "a.c", "-quiet", "-dumpbase", > "a.c", "-dumpbase-ext", ".c", "-mtune=generic", "-march=x86-64", > "-fmacro-prefix-map=...=...", > "-fmacro-prefix-map=...=...", > ...], > [..., > "COLLECT_GCC=gcc", > "COLLECT_GCC_OPTIONS='-fmacro-prefix-map=...=...' > '-fmacro-prefix-map=...=...' ... '-c' '-mtune=generic' '-march=x86-64'"]) = > -1 E2BIG (Argument list too long) > > Note how `gcc` not only expands response file into an argument list > (that is not too bad) but also duplicates the whole list as a single > `COLLECT_GCC_OPTIONS=...` environment variable with added quoting on > top. > > Would be nice if `gcc` just passed response files around as is :) That's not possible in general since specs processing can alter the command-line. What it could do is create an alternate response file with (all?) arguments when a certain limit is exceeded (or the original command-line included response files). That could be referenced from COLLECT_GCC_OPTIONS as well but of course that would require patching all COLLECT_GCC_OPTIONS consumers (for example lto-wrapper doesn't handle response files there). So it's not even a half-way solution (unless the env limit is way higher). Richard. > -- > > Sergei