Hi! Looking at <git://anonscm.debian.org/reproducible/dpkg.git>.
Have you seen any actual problem to warrant the «Ensure stable order of Checksums-* fields» commit? In principle the output order is preserved from the input one. And here's a quickish review of the dpkg-genbuildinfo changes, taking into account that I'm looking at this as a general tool for recording the build environment, not specific for just reproducible builds. Some general comments first: * It would have been nice to get the .buildinfo stuff up for review in the debian-dpkg list too (besides the ftp-masters bug 763822). * I'm still somewhat unconvinced that having byte-for-byte identical container binary .deb packages is the ideal minimal reproducible unit. This will completely disallow digital signatures embedded in the binary packages or per-executable signatures in their xattr metadata for example, and that seems to me was completely ignored last time around. I'd be very unhappy if at some point reproducible builds were enforced and that'd mean we've painted ourselves into a corner with other potential additions to the .deb that might not be reproducible by design. * Somewhat related to the above point and this new file. Timestamps in some places (mainly on the actual file metadata, for example in ar and tar containers) are actually very useful, because as long as we don't have a stored recording of the build environment, it's the next "best" approximation to that. Getting rid of those prematurely makes us miss very valuable information for debugging and similar. And yes, even replacing the current time with the changelog time is missing information. This has also been dismissed previously (with, I'd say, even a bit of contempt!). * I'm not entirely sure if this really makes sense as a different file, but at least given that it's controlled by dpkg-buildpackage we can always fold it into dpkg-genchanges if we deem that's a better course of action. So it seems fine this way for now. * Some of the information in this file trigger my privacy alarm bells, for example the Build-Path field. > diff --git a/debian/usertags b/debian/usertags > index 0fc26f2..0419488 100644 > --- a/debian/usertags > +++ b/debian/usertags > @@ -65,6 +65,7 @@ dpkg-checkbuilddeps [DPKG-CHECKBUILDDEPS] > dpkg-deb [DPKG-DEB] > dpkg-distaddfile [DPKG-DISTADDFILE] > dpkg-divert [DPKG-DIVERT] > +dpkg-genbuildinfo [DPKG-GENBUILDINFO] The pseudo-tag is only needed for old tools for legacy reasons. I should just do a round of «bts retitle» at some point. > dpkg-genchanges [DPKG-GENCHANGES] > dpkg-gencontrol [DPKG-GENCONTROL] > dpkg-gensymbols [DPKG-GENCSYMBOLS] > diff --git a/man/dpkg-genbuildinfo.1 b/man/dpkg-genbuildinfo.1 > new file mode 100644 > index 0000000..626bf66 > --- /dev/null > +++ b/man/dpkg-genbuildinfo.1 > @@ -0,0 +1,89 @@ > +.\" dpkg manual page - dpkg-genbuildinfo(1) > +.\" > +.\" Copyright © 2015 Jérémy Bobbio <lu...@debian.org> This probably needs to be completed with the © from the sources it's borrowing from. The changes to man/po/po4a.conf are missing. > diff --git a/scripts/Dpkg/Control/FieldsCore.pm > b/scripts/Dpkg/Control/FieldsCore.pm > index 8a5695b..75de185 100644 > --- a/scripts/Dpkg/Control/FieldsCore.pm > +++ b/scripts/Dpkg/Control/FieldsCore.pm > @@ -361,6 +373,11 @@ our %FIELD_ORDER = ( > Vcs-Svn Testsuite), &field_list_src_dep(), qw(Package-List), > @checksum_fields, qw(Files) > ], > + CTRL_FILE_BUILDINFO() => [ > + qw(Format Build-Architecture Source Binary Architecture Version > + Build-Environment Build-Path), > + @checksum_fields, > + ], Probably pack all «Build-» fields together at the end after the checksum ones, and leave Build-Environment as the last of those, as it's the one taking the most vertical space. > CTRL_FILE_CHANGES() => [ > qw(Format Date Source Binary Binary-Only Built-For-Profiles > Architecture > Version Distribution Urgency Maintainer Changed-By Description > diff --git a/scripts/dpkg-buildpackage.pl b/scripts/dpkg-buildpackage.pl > index a3cca87..018f37d 100755 > --- a/scripts/dpkg-buildpackage.pl > +++ b/scripts/dpkg-buildpackage.pl > @@ -158,6 +158,7 @@ my $since; > my $maint; > my $changedby; > my $desc; > +my @buildinfo_opts; > my @changes_opts; > my @hook_names = qw( > init preclean source build binary changes postclean check sign done The buildinfo hook name is missing in the @hook_names array. > @@ -567,6 +570,25 @@ if ($include & BUILD_BINARY) { > withecho(@debian_rules, $buildtarget); > run_hook('binary', 1); > withecho(@rootcommand, @debian_rules, $binarytarget); > + > + if (-e "../$pv.dsc") { Why is the buildinfo file tied to whether there's or not a source package? I'd say it might make sense to not generate a buildinfo file only if we are not building binary packages. To check for that use the BUILD_ constants. > + run_hook('buildinfo', 1); The hook should not be conditional to the code being executed. The second argument should represent that. > + push @buildinfo_opts, "--admindir=$admindir" if $admindir; > + > + my $buildinfo_path = "../$pva.buildinfo"; > + my $buildinfo = Dpkg::Control->new(type => CTRL_FILE_BUILDINFO); > + > + print { *STDERR } " dpkg-genbuildinfo @buildinfo_opts > >$buildinfo_path\n"; > + > + open my $buildinfo_fh, '-|', 'dpkg-genbuildinfo', @buildinfo_opts > + or subprocerr('dpkg-genbuildinfo'); > + $buildinfo->parse($buildinfo_fh, _g('parse buildinfo file')); > + $buildinfo->save($buildinfo_path); > + close $buildinfo_fh or subprocerr(_g('dpkg-genbuildinfo')); There's no need for all this, just call dpkg-genbuildinfo redirecting its output to the destination file. The $changes code where this is originating from is accessed later on, so that's why it's being parsed. > + } else { > + warning(_g('no .dsc file, skipping .buildinfo generation')); > + } > } > diff --git a/scripts/dpkg-genbuildinfo.pl b/scripts/dpkg-genbuildinfo.pl > new file mode 100755 > index 0000000..d7784ee > --- /dev/null > +++ b/scripts/dpkg-genbuildinfo.pl > @@ -0,0 +1,263 @@ > +#!/usr/bin/perl > +# > +# dpkg-genbuildinfo > +# > +# Copyright © 2003-2013 Yann Dirson <dir...@debian.org> > +# Copyright © 2014 Niko Tyni <nt...@debian.org> > +# Copyright © 2014-2015 Jérémy Bobbio <lu...@debian.org> This also probably needs to be completed with the © from the other sources it's borrowing from. > +my $quiet = 0; There's nothing to be quiet about here. > +my $buildinfo_format = '1.9'; Why is this format 1.9? > +my $checksums = Dpkg::Checksums->new(); > +my %archadded; > +my @archvalues; > + > +# There's almost the same function in dpkg-checkbuilddeps, > +# they probably should be factored out. > +sub parse_status { I've a local commit switching the parse_status() in dpkg-checkbuilddeps to use Dpkg::Index, unfortunately due to the current implementation it incurs a significant speed regression, but I'm reworking the Dpkg::Control parser to speed it up, so this code will be able to switch to something similar too. > + my $status = shift; > + > + my $facts = Dpkg::Deps::KnownFacts->new(); > + my %depends; > + local $/ = ''; > + open(my $status_fh, '<', $status) > + or syserr(_g('cannot open %s'), $status); > + while (<$status_fh>) { > + next unless /^Status: .*ok installed$/m; > + > + my ($package) = /^Package: (.*)$/m; > + my ($version) = /^Version: (.*)$/m; > + my ($arch) = /^Architecture: (.*)$/m; > + my ($multiarch) = /^Multi-Arch: (.*)$/m; > + $facts->add_installed_package($package, $version, $arch, > + $multiarch); > + > + if (/^Provides: (.*)$/m) { > + my $provides = deps_parse($1, reduce_arch => 1, union => 1); > + next if not defined $provides; > + foreach (grep { $_->isa('Dpkg::Deps::Simple') } > + $provides->get_deps()) > + { > + $facts->add_provided_package($_->{package}, > + $_->{relation}, $_->{version}, > + $package); > + } > + } > + > + if (/^(?:Pre-)?Depends: (.*)$/m) { > + foreach (split(/,\s*/, $1)) { > + push @{$depends{$package}}, $_; This will merge all dependencies from all Multi-Arch:same instances. But in any case why record the dependencies of the interesting packages? > + } > + } > + } > + close $status_fh; > + > + return ($facts, \%depends); > +} > +# Retrieve info from the current changelog entry > +my %options = (file => $changelogfile); > +$options{changelogformat} = $changelogformat if $changelogformat; > +my $changelog = changelog_parse(%options); > +# Other initializations > +my $control = Dpkg::Control::Info->new($controlfile); > +my $fields = Dpkg::Control->new(type => CTRL_FILE_BUILDINFO); > + > +my $dist = Dpkg::Dist::Files->new(); > + > +# include .dsc > +my $spackage = $changelog->{'Source'}; > +(my $sversion = $changelog->{'Version'}) =~ s/^\d+://; This will grab the binary version, not the source version, this will break on binNMUs. We probably need to record both source and binary versions in the buildinfo file. And in case they differ the changelog entry, which tends to differ per arch. > +my $dsc = "${spackage}_${sversion}.dsc"; > +my $dsc_pathname = "$uploadfilesdir/$dsc"; > +my $dsc_fields = Dpkg::Control->new(type => CTRL_PKG_SRC); > +$checksums->add_from_file($dsc_pathname, key => $dsc); > + > +my $dist_count = 0; > + > +$dist_count = $dist->load($fileslistfile) if -e $fileslistfile; > + > +error(_g('binary build with no binary artifacts found; cannot distribute')) > + if $dist_count == 0; > + > +foreach my $file (sort $dist->get_files()) { Why the sort, the function is supposed to preserve the same insertion order. > +} > +my @status = parse_status("$admindir/status"); > +my $facts = shift @status; > +my %depends=%{shift @status}; > +my $deps; > +my %env_pkgs; > + > +# parse essential list > +open (RAWFILE, '/usr/share/build-essential/essential-packages-list') > + or error("cannot read > /usr/share/build-essential/essential-packages-list: $!\n"); > +# skip header > +while (my $line = <RAWFILE>) { > + chomp $line; > + last if $line eq ''; > +} > +while (my $line = <RAWFILE>) { > + chomp $line; > + $env_pkgs{$line} = 1; > +} > +close RAWFILE; Hmm, please just record installed packages that have the Essential:yes field. I don't want the code to rely on such externally defined files. (As an aside, in the dpkg codebase please use scalar variables for filehandles instead of barewords, use syserr when $! is concerned, mark strings for translation, and pass filenames as format string arguments.) > +append_deps(\%env_pkgs, 'build-essential', > + $control->{source}->{'Build-Depends-Indep'}, > + $control->{source}->{'Build-Depends'}); Missing Build-Depends-Arch field. But those fields should not be recorded if the build does not concern them. And this uses an undocumented access to the $control object. > +while (my ($pkg, $seen) = each(%env_pkgs)) { > + next if $seen; > + $env_pkgs{$pkg} = 1; # mark as seen > + next unless defined $facts->{pkg}->{$pkg}->[0]; The Dpkg::Deps::KnownFacts object is not supposed to be accessed directly, it's only for internal use by other Dpkg::Deps objects. > + append_deps(\%env_pkgs, @{$depends{$pkg}}); > + keys %env_pkgs; # reset iterator > +} > + > +my $environment = Dpkg::Deps::AND->new(); > +foreach my $pkg (sort keys %env_pkgs) { > + foreach my $installed_pkg (@{$facts->{pkg}->{$pkg}}) { > + my $version = $installed_pkg->{version}; > + my $architecture = $installed_pkg->{architecture}; > + my $pkg_name = $pkg; Useless variable? > + if (defined $architecture && > + $architecture ne 'all' && $architecture ne $build_arch) { This should probably be host_arch. > + $pkg_name = "$pkg_name:$architecture"; > + } > + my $dep = Dpkg::Deps::Simple->new("$pkg_name (= $version)"); > + $environment->add($dep); > + } > +} > +$environment = "\n" . $environment->output(); > +$environment =~ s/, /,\n/g; This is changing the type of $environment on assignment which seems rather confusing. And it seem to be doing lots of parsing and dumping for something that's simply string concatenations. Just do the equivalent of: $var .= "$pkgname (= $pkgver),\n"; or probably even better, just use an array, and join them with ",\n". > +$fields->{'Build-Environment'} = $environment; > + > +$fields->output(\*STDOUT); I get the impression the code tracking the used installed packages is made much more difficult than it needs to be. > diff --git a/scripts/dpkg-genchanges.pl b/scripts/dpkg-genchanges.pl > index 3303d5c..4daf530 100755 > --- a/scripts/dpkg-genchanges.pl > +++ b/scripts/dpkg-genchanges.pl > @@ -342,6 +341,10 @@ if ($include & BUILD_BINARY) { > if $file->{arch} and not $archadded{$file->{arch}}++; > } > } > + > + my $arch = $include == BUILD_ARCH_INDEP ? 'all' : $host_arch; > + my $buildinfo = "${spackage}_${sversion}_${arch}.buildinfo"; The problem here is that sversion is the actual source version, which means on binNMUs the file will not exist. > + $dist->add_file($buildinfo, $sec, $pri); And I don't feel very comfortable assuming the .buildinfo name here when it is not being generated in this same script, and when the script generating the output is not even defining the name itself. Thanks, Guillem -- To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150201024852.ga26...@gaara.hadrons.org