Reproducible builds - supporting relative paths in *-prefix-map
Hi, I'm wondering if we'd be able to improve path handling in the -f*- prefix-map compiler options to cover relative paths? Currently it works well for absolute paths but if a file uses a relative path or a path with a symlink in, or a non-absolute path, it will miss those cases. For relative paths in particular it is problematic as you can't easily construct a compiler commandline that would cover all relative path options. At first glance this is relatively straight forward, for example: Index: gcc-12.1.0/gcc/file-prefix-map.cc === --- gcc-12.1.0.orig/gcc/file-prefix-map.cc +++ gcc-12.1.0/gcc/file-prefix-map.cc @@ -70,19 +70,25 @@ remap_filename (file_prefix_map *maps, c file_prefix_map *map; char *s; const char *name; + char *realname; size_t name_len; + realname = lrealpath (filename); + for (map = maps; map; map = map->next) -if (filename_ncmp (filename, map->old_prefix, map->old_len) == 0) +if (filename_ncmp (realname, map->old_prefix, map->old_len) == 0) break; - if (!map) + if (!map) { +free (realname); return filename; - name = filename + map->old_len; + } + name = realname + map->old_len; name_len = strlen (name) + 1; s = (char *) ggc_alloc_atomic (name_len + map->new_len); memcpy (s, map->new_prefix, map->new_len); memcpy (s + map->new_len, name, name_len); + free (realname); return s; } which address a realpath() call into the prefix mapping code. I did experiment with this and found it breaks compiling ruby and xen-tools which both have code which does: #include __FILE__ It may be possible to make the remapping conditional of not being directly in a #include statement but I didn't find the gcc code responsible for that as yet. I also noticed some valgrind tests fails after it, I've not looked into why that would be yet. I wanted to ask if there would be any interest in adding support for something like this? I suspect the include/__FILE__ issue is probably a latent bug anyway. If anyone has any pointers to the code I could improve my patch with I'm also happy to have them! In case it helps, my background on what I'm working on and why we'd find this useful follows: I work on Yocto Project which cross compiles complete software stacks, we use gcc heavily and the resulting builds are all around us. We care deeply about build reproducibility. With autotools, we long ago realised that we were best off separating the source and the build output where we could and to do it we used relative paths, which removed a lot of problematic hardcoded paths in our output. More recently we've been using -fdebug-prefix-map and -fmacro-prefix- map (we need both since some of binutils doesn't seem to support the combined version yet). We use the debug information (split separately) to support on target debugging but where relative paths were used, we miss information as the remapping either doesn't happen correctly or is mangled. We'd like to fix our debug output but we're finding that hard without relative path support for *prefix-map. Cheers, Richard
Re: Reproducible builds - supporting relative paths in *-prefix-map
On Mon, 2022-08-15 at 12:13 +0100, Richard Purdie via Gcc wrote: > Hi, > > I'm wondering if we'd be able to improve path handling in the -f*- > prefix-map compiler options to cover relative paths? > > Currently it works well for absolute paths but if a file uses a > relative path or a path with a symlink in, or a non-absolute path, it > will miss those cases. For relative paths in particular it is > problematic as you can't easily construct a compiler commandline that > would cover all relative path options. > > At first glance this is relatively straight forward, for example: > > Index: gcc-12.1.0/gcc/file-prefix-map.cc > === > --- gcc-12.1.0.orig/gcc/file-prefix-map.cc > +++ gcc-12.1.0/gcc/file-prefix-map.cc > @@ -70,19 +70,25 @@ remap_filename (file_prefix_map *maps, c >file_prefix_map *map; >char *s; >const char *name; > + char *realname; >size_t name_len; > > + realname = lrealpath (filename); > + >for (map = maps; map; map = map->next) > -if (filename_ncmp (filename, map->old_prefix, map->old_len) == 0) > +if (filename_ncmp (realname, map->old_prefix, map->old_len) == 0) >break; > - if (!map) > + if (!map) { > +free (realname); > return filename; > - name = filename + map->old_len; > + } > + name = realname + map->old_len; >name_len = strlen (name) + 1; > >s = (char *) ggc_alloc_atomic (name_len + map->new_len); >memcpy (s, map->new_prefix, map->new_len); >memcpy (s + map->new_len, name, name_len); > + free (realname); >return s; > } > > which address a realpath() call into the prefix mapping code. I did > experiment with this and found it breaks compiling ruby and xen-tools > which both have code which does: > > #include __FILE__ > > It may be possible to make the remapping conditional of not being > directly in a #include statement but I didn't find the gcc code > responsible for that as yet. I also noticed some valgrind tests fails > after it, I've not looked into why that would be yet. > > I wanted to ask if there would be any interest in adding support for > something like this? I suspect the include/__FILE__ issue is probably a > latent bug anyway. If anyone has any pointers to the code I could > improve my patch with I'm also happy to have them! To answer my own question, something like: +Index: gcc-12.1.0/libcpp/macro.cc +=== +--- gcc-12.1.0.orig/libcpp/macro.cc gcc-12.1.0/libcpp/macro.cc +@@ -563,7 +563,7 @@ _cpp_builtin_macro_text (cpp_reader *pfi + if (!name) + abort (); + } +- if (pfile->cb.remap_filename) ++ if (pfile->cb.remap_filename && !pfile->state.in_directive) + name = pfile->cb.remap_filename (name); + len = strlen (name); + buf = _cpp_unaligned_alloc (pfile, len * 2 + 3); seems to do roughly what I was wondering about. I'd be interested to understand whether some patch along the lines I've mentioned here would stand a chance of being accepted or not. Cheers, Richard
Re: Reproducible builds - supporting relative paths in *-prefix-map
On Mon, Aug 15, 2022 at 10:28 AM Richard Purdie via Gcc wrote: > > On Mon, 2022-08-15 at 12:13 +0100, Richard Purdie via Gcc wrote: > > Hi, > > > > I'm wondering if we'd be able to improve path handling in the -f*- > > prefix-map compiler options to cover relative paths? > > > > Currently it works well for absolute paths but if a file uses a > > relative path or a path with a symlink in, or a non-absolute path, it > > will miss those cases. For relative paths in particular it is > > problematic as you can't easily construct a compiler commandline that > > would cover all relative path options. > > > > At first glance this is relatively straight forward, for example: > > > > Index: gcc-12.1.0/gcc/file-prefix-map.cc > > === > > --- gcc-12.1.0.orig/gcc/file-prefix-map.cc > > +++ gcc-12.1.0/gcc/file-prefix-map.cc > > @@ -70,19 +70,25 @@ remap_filename (file_prefix_map *maps, c > >file_prefix_map *map; > >char *s; > >const char *name; > > + char *realname; > >size_t name_len; > > > > + realname = lrealpath (filename); > > + > >for (map = maps; map; map = map->next) > > -if (filename_ncmp (filename, map->old_prefix, map->old_len) == 0) > > +if (filename_ncmp (realname, map->old_prefix, map->old_len) == 0) > >break; > > - if (!map) > > + if (!map) { > > +free (realname); > > return filename; > > - name = filename + map->old_len; > > + } > > + name = realname + map->old_len; > >name_len = strlen (name) + 1; > > > >s = (char *) ggc_alloc_atomic (name_len + map->new_len); > >memcpy (s, map->new_prefix, map->new_len); > >memcpy (s + map->new_len, name, name_len); > > + free (realname); > >return s; > > } > > > > which address a realpath() call into the prefix mapping code. I did > > experiment with this and found it breaks compiling ruby and xen-tools > > which both have code which does: > > > > #include __FILE__ > > > > It may be possible to make the remapping conditional of not being > > directly in a #include statement but I didn't find the gcc code > > responsible for that as yet. I also noticed some valgrind tests fails > > after it, I've not looked into why that would be yet. > > > > I wanted to ask if there would be any interest in adding support for > > something like this? I suspect the include/__FILE__ issue is probably a > > latent bug anyway. If anyone has any pointers to the code I could > > improve my patch with I'm also happy to have them! > > To answer my own question, something like: > > +Index: gcc-12.1.0/libcpp/macro.cc > +=== > +--- gcc-12.1.0.orig/libcpp/macro.cc > gcc-12.1.0/libcpp/macro.cc > +@@ -563,7 +563,7 @@ _cpp_builtin_macro_text (cpp_reader *pfi > + if (!name) > + abort (); > + } > +- if (pfile->cb.remap_filename) > ++ if (pfile->cb.remap_filename && !pfile->state.in_directive) > + name = pfile->cb.remap_filename (name); > + len = strlen (name); > + buf = _cpp_unaligned_alloc (pfile, len * 2 + 3); > > seems to do roughly what I was wondering about. > > I'd be interested to understand whether some patch along the lines I've > mentioned here would stand a chance of being accepted or not. Thanks for recognizing this issue and proposing a solution. It's probably more effective to submit this as an actual patch to gcc-patches and cc David Malcolm, libcpp maintainer, than to ask hypotheticals on the GCC mailing list. Thanks, David
Re: Reproducible builds - supporting relative paths in *-prefix-map
On Mon, 15 Aug 2022, Richard Purdie via Gcc wrote: > Currently it works well for absolute paths but if a file uses a > relative path or a path with a symlink in, or a non-absolute path, it > will miss those cases. For relative paths in particular it is > problematic as you can't easily construct a compiler commandline that > would cover all relative path options. I'd expect a relative path to be naturally relocatable without needing to be remapped. (For example, DW_AT_comp_dir would be relocated in debug info, but there would be no need to relocate the paths to individual files that are relative to DW_AT_comp_dir.) Is the issue that you're using relative paths between two directories that don't have a fixed relative path between them, such as between the build and source directories, as opposed to relative paths within the source directory or within the build directory? > which address a realpath() call into the prefix mapping code. I did That would run the risk of breaking relocation for anyone who has deliberately used the paths they pass to the compiler (possibly involving symlinks, for example) in their remapping options - not expecting a further level of processing to be applied to those paths before remapping. -- Joseph S. Myers jos...@codesourcery.com
Re: Reproducible builds - supporting relative paths in *-prefix-map
On Mon, 2022-08-15 at 17:15 +, Joseph Myers wrote: > On Mon, 15 Aug 2022, Richard Purdie via Gcc wrote: > > > Currently it works well for absolute paths but if a file uses a > > relative path or a path with a symlink in, or a non-absolute path, it > > will miss those cases. For relative paths in particular it is > > problematic as you can't easily construct a compiler commandline that > > would cover all relative path options. > > I'd expect a relative path to be naturally relocatable without needing to > be remapped. (For example, DW_AT_comp_dir would be relocated in debug > info, but there would be no need to relocate the paths to individual files > that are relative to DW_AT_comp_dir.) > > Is the issue that you're using relative paths between two directories that > don't have a fixed relative path between them, such as between the build > and source directories, as opposed to relative paths within the source > directory or within the build directory? Yes, that is the issue. We build with separate build and source directories with a relative path between them and that relation doesn't work on target, particularly for installed binaries. It is a general problem but particularly where we build multiple components from the one shared source tree as those paths become multiple levels deep which makes creating a source layout on target to match even harder. > > which address a realpath() call into the prefix mapping code. I did > > That would run the risk of breaking relocation for anyone who has > deliberately used the paths they pass to the compiler (possibly involving > symlinks, for example) in their remapping options - not expecting a > further level of processing to be applied to those paths before remapping. It would be a change in behaviour but I'm not sure how many people would be relying on that or if they even should be. A relative path would only be relocated if it did match something in the prefix-map variables so this would be controllable. Cheers, Richard
Re: Reproducible builds - supporting relative paths in *-prefix-map
Hi Richard, I added Sergio to the CC since he was looking at debuginfo/DWARF relative paths created by -fdebug-prefix-map in Debian and having trouble making them work correctly. Maybe he has some feedback how to make this work. On Mon, Aug 15, 2022 at 12:13:28PM +0100, Richard Purdie via Gcc wrote: > I'm wondering if we'd be able to improve path handling in the -f*- > prefix-map compiler options to cover relative paths? > [...] > In case it helps, my background on what I'm working on and why we'd > find this useful follows: > > I work on Yocto Project which cross compiles complete software stacks, > we use gcc heavily and the resulting builds are all around us. We care > deeply about build reproducibility. > > With autotools, we long ago realised that we were best off separating > the source and the build output where we could and to do it we used > relative paths, which removed a lot of problematic hardcoded paths in > our output. > > More recently we've been using -fdebug-prefix-map and -fmacro-prefix- > map (we need both since some of binutils doesn't seem to support the > combined version yet). We use the debug information (split separately) > to support on target debugging but where relative paths were used, we > miss information as the remapping either doesn't happen correctly or is > mangled. We'd like to fix our debug output but we're finding that hard > without relative path support for *prefix-map. I might be misinterpreting the issue you are seeing. But one problem with debuginfo/DWARF is that relative source paths aren't clearly defined. If you move or install the executable or (split) debug file out of the build directory a DWARF reader has no way to know what the paths are relative to. So for DWARF the paths always have to be absolute (they can still be relative to the compilation dir (DW_AT_comp_dir), but at least that has to be absolute (and the compiler should turn any relative path into an absolute one or make sure they are relative to an absolute compilation directory path). The problem with that is a) it makes the binary (debuginfo) output dependent on the build srcdir and builddir and b) to support debugging/tracing/profiling tools the user has to install the (debug) sources under those exact same directories on their local machine. One way around this is to use -fdebug-prefix-map with an absolute paths under which you will also install any source files. Or to use debugedit [*] after the build to rewrite the build dir path to something like /usr/debug/src/ which is what some distros do. e.g. Fedora has (one or more) binary package, a debuginfo package for the .debug files (optionally installed under /usr/lib/debug) and debugsource package for the debug source files (optionally installed under /usr/debug/src). These packages can then also be used to dynamically find any debug or source file when indexed by debuginfod [**]. Using known absolute paths generated with debugedit or -fdebug-prefix-map makes sure the paths used in the debuginfo/DWARF are always the same independent from the current srcdir or builddir to make them reproducible. And the user/tools don't have to guess what the relative paths are relative to. And it makes it so you can install the (debug) source files of all packages/versions/arches in parallel without file conflicts. Cheers, Mark [*] https://sourceware.org/debugedit/ [**] http://debuginfod.elfutils.org/
Re: Reproducible builds - supporting relative paths in *-prefix-map
Hi Mark, Thanks for the reply. On Mon, 2022-08-15 at 21:55 +0200, Mark Wielaard wrote: > I added Sergio to the CC since he was looking at debuginfo/DWARF > relative paths created by -fdebug-prefix-map in Debian and having trouble > making them work correctly. Maybe he has some feedback how to make > this work. > > On Mon, Aug 15, 2022 at 12:13:28PM +0100, Richard Purdie via Gcc wrote: > > I'm wondering if we'd be able to improve path handling in the -f*- > > prefix-map compiler options to cover relative paths? > > [...] > > In case it helps, my background on what I'm working on and why we'd > > find this useful follows: > > > > I work on Yocto Project which cross compiles complete software stacks, > > we use gcc heavily and the resulting builds are all around us. We care > > deeply about build reproducibility. > > > > With autotools, we long ago realised that we were best off separating > > the source and the build output where we could and to do it we used > > relative paths, which removed a lot of problematic hardcoded paths in > > our output. > > > > More recently we've been using -fdebug-prefix-map and -fmacro-prefix- > > map (we need both since some of binutils doesn't seem to support the > > combined version yet). We use the debug information (split separately) > > to support on target debugging but where relative paths were used, we > > miss information as the remapping either doesn't happen correctly or is > > mangled. We'd like to fix our debug output but we're finding that hard > > without relative path support for *prefix-map. > > I might be misinterpreting the issue you are seeing. > > But one problem with debuginfo/DWARF is that relative source paths > aren't clearly defined. If you move or install the executable or > (split) debug file out of the build directory a DWARF reader has no > way to know what the paths are relative to. > > So for DWARF the paths always have to be absolute (they can still be > relative to the compilation dir (DW_AT_comp_dir), but at least that > has to be absolute (and the compiler should turn any relative path > into an absolute one or make sure they are relative to an absolute > compilation directory path). It gets slightly more complicated as we build in a directory separate to the source where we can. Some source files are generated source files and placed in the build directory whilst many are in the source directory. DW_AT_comp_dir can be set to one or the other but it is the relative path between build and source which is problematic. > The problem with that is a) it makes the binary (debuginfo) output > dependent on the build srcdir and builddir and b) to support > debugging/tracing/profiling tools the user has to install the (debug) > sources under those exact same directories on their local machine. > > One way around this is to use -fdebug-prefix-map with an absolute > paths under which you will also install any source files. Or to use > debugedit [*] after the build to rewrite the build dir path to > something like /usr/debug/src/ which is > what some distros do. > > e.g. Fedora has (one or more) binary package, a debuginfo package for > the .debug files (optionally installed under /usr/lib/debug) and > debugsource package for the debug source files (optionally installed > under /usr/debug/src). These packages can then also be used to > dynamically find any debug or source file when indexed by debuginfod > [**]. We split the debuginfo into a separate package. We also look at the sources it references and those go into a different separate package too. We support populating a remote debuginfod server with these or installing them onto the target. > Using known absolute paths generated with debugedit or > -fdebug-prefix-map makes sure the paths used in the debuginfo/DWARF > are always the same independent from the current srcdir or builddir to > make them reproducible. And the user/tools don't have to guess what > the relative paths are relative to. We have that working and set debug-prefix-map today. What is problematic is trying to recreate the relative paths on target between our source and build directories. Currently, most generated files in the build directory just don't get handled correctly on target. We'd like to fix that. There is currently no way to remap a relative path though, at least as far as I could determine. Cheers, Richard