Re: [RFC] General Resolution to deploy tag2upload

Russ Allbery Thu, 20 Jun 2024 12:49:13 -0700

(Trimming the cc list a bit to keep from deluging people who aren't
actively participating in this part of the conversation.)

Ansgar 🙀 <ans...@43-1.org> writes:
> On Wed, 2024-06-19 at 16:06 -0700, Russ Allbery wrote:

>> I appreciate that you're trying really hard to find a way to represent
>> the Git tree directly in a source package so that no build step is
>> required and building the tree of files locally is trivial.  I
>> understand that you truly think that this accomplishes the same goal
>> with some acceptable lossage around the edges.  But it doesn't; it's
>> missing the point.  The point is that there are a wide variety of
>> potential transformations between the Git tree and the source package
>> required to accomodate the range of Git workflows used in Debian today
>> *and tomorrow*.

> Let us talk about *today*. How many packages would not be possible to
> upload via tag2upload if one required a signature covering content of
> packages? Is it 0.1%? Is it 90%?

If we're talking about *today*, the answer is 100%, because what you're
describing requires new code in both dak and tag2upload that no one has
written.  But I understand that you're not really talking about today,
you're asking about a hypothetical future world in which someone does that
work.

I personally do not have those numbers.  I know there are a huge variety
of workflows mostly from previous debian-devel discussions, which gives me
an appreciation for the scope of the problem that tag2upload solves but
doesn't give me numbers.  I checked on trends.debian.net to see if by
chance it was trying to collect workflow data, but the closest thing to
relevant is graphs showing the overwhelming popularity of 3.0 (quilt) as a
source package format.

I can say that for the packages I maintain personally, 100% of them would
not be possible to upload this way at some point over time.  As mentioned
previously, I frequently have reasons to carry a Debian-specific patch for
some period of time (which is a file that's generated at source package
build time), and for the reasons that Philip Hands already explained, I
don't want to have to think, with each package upload, whether this upload
of the package will be blocked if I use tag2upload.

If one only looks at the most recent version of each of my packages at a
specific point in time, my guess is that about 50% of those packages could
not be uploaded and the other 50% would currently work because they
currently don't have any Debian-specific changes to upstream and therefore
building the *.debian.tar.xz is fairly trivial and involves only files
already present in the Git head.  I haven't counted, though, so consider
this number rather rough; I could be off by 10-20% in either direction.

> For *tomorrow* we might change things in the future. Some things like
> arbitrary code execution at .dsc construction time are fairly useful
> after all (required for some workflows, even when it might not change
> the files ending in the source package).

If you also agree that having a source package build step is useful, why
are you opposed it?  What do you think would be different about tomorrow
that we don't know today?  I don't understand what you're trying to
accomplish.

Also, I think you're talking about yet a different style of build system
than what tag2upload does, but just to be explicit: I believe tag2upload
source package builds do not involve arbitrary code execution, if by that
you mean running code shipped in the Git repository being built.  (The
tag2upload and dpkg-source code is not "arbitrary" in the normal security
sense of that term.)

Normally, arbitrary code execution can happen with source package builds
due to running the clean target, which can be arbitrary code specified in
debian/rules.  That shouldn't be necessary in tag2upload since it's
starting from a clean Git checkout.

I'm not sure I'm a fan of truly arbitrary code execution during the source
package build, since avoiding that is a very nice bit of security
hardening for the source package builder that makes it way, way more
difficult for a malicious source package to somehow compromise the
builder.  But that's hardening, not an essential element of the design, so
I could be convinced.

> Pretty much all changes in Debian (say systemd, usrmerge, ...) happened
> incrementally. Why should that not be appropriate here? (Or was a slow
> move wrong in retrospect and we should have decided to support only
> systemd and drop sysvinit in 2014, and to move to only usrmerge also
> several releases earlier?)

I don't think you thought through this analogy before you mailed it.  :)
Surely the differences between a system that would only be used by Debian
package uploaders and is strictly optional, and major architectural
changes to Debian *user* systems that are either huge changes in how they
administer their systems (in the first case) or irreversible changes to
their system (in the second case), is too obvious to get into.

Also, I do feel like I have to point out that you have not, in this
thread, been asking for an incremental deployment.  You have been asking
for someone to completely rewrite tag2upload along different principles
using brand new features of dak that no one has yet written, in order to
get a system that does a fraction of what it currently can do, has severe
flexibility restrictions, and would be, to be quite frank, a tottering
pile of workarounds and hacks.  You're asking for the sort of muddled and
incoherent design that, were someone advocating it anywhere else in
Debian, you would be grumbling about as poor architecture.  And you would
be correct.

The straightforward, flexible, and adaptable way to glue together the wild
variety of preferred Git packaging workflows with a stable and useful
source package format is to put a build step in between those two things.
This is not some sort of novel or radical idea.  It is literally what we
have been doing in Debian for as long as I have been involved in the
project.

tag2upload doesn't change that property.  It just allows me, and others
who want to use it, to move that step off of potentially untrustworthy or
unsuitable local hardware onto a Debian project system that, like binary
buildds, we can secure and incrementally improve.  And, as a bonus, it
regularizes the construction and gives us hooks for doing useful
additional verifications in the future.

> Why should there be a standardized Debian source package in the end
> (where tag2upload might try to build quilt patches) when nobody but a
> machine is supposed to use them?

This is such a mystifying question to me that I can't reverse-engineer
what you are trying to ask.  There are innumerable consumers of source
packages in Debian: analysis tools, linting tools, tools that gather
together all of the Debian-specific patches so that upstreams can see how
we're changing their software, reporting, auditing, etc.  tag2upload's
generation of normal, semantically correct 3.0 (quilt) packages lets that
entire ecosystem of downstream tools continue to work.  In some cases,
using it (or dgit) makes them work better, by providing more or
higher-quality metadata.  This was the case for my own personal packages.

If you want to change the source package format in the archives, by (for
example) deprecating 3.0 (quilt) in favor of something else, I think
that's at least arguably within your remit as an FTP team delegate.
Figuring out all of the things that would break and all the assumptions
that would change and fixing the entire ecosystem of tools so that you can
make such a change is a massive amount of work, and I don't envy you the
task.  Nor would I make any assumptions about how long it would take you
to complete.

In the meantime, tag2upload works correctly with the ecosystem of tools
that we have now, with the minor caveat that tools that care specifically
about the identity of the package uploader will need to parse some fields
out of *.dsc or *.changes instead of only from the OpenPGP signature.

> Also, if they are standardized why are there options for the maintainer
> to control how the source package gets constructed? (Like an option to
> control how many patches end up in d/patches.)

Because that's a *semantic* decision and dgit or tag2upload cannot always
guess the correct semantics.  Some maintainers choose to not maintain
changes to upstream as separable commits that can be meaningfully turned
into a patch series, and in that case they need to tell dgit or tag2upload
to not attempt to make sense of the changes and instead just build a
monolithic patch.

The tools can do a lot, but they can't recover information that the
package maintainer has intentionally discarded.

-- 
Russ Allbery (r...@debian.org)              <https://www.eyrie.org/~eagle/>

Re: [RFC] General Resolution to deploy tag2upload

Reply via email to