Hi Colin, Thank you for the review!
On Wed, Mar 17, 2021 at 05:16:29PM +0000, Colin Watson wrote: > I don't see why Vcs-Git-Refs would ever need to be plural: if a commit > isn't reachable from any single ref, then adding another ref will never > be helpful. This should just be Vcs-Git-Ref, singular, which also > simplifies code that uses it (since it will in practice always be a > single ref, any code to handle more than one ref there would be > poorly-tested). I made it plural for the widest future compatibility. For example, dgit uses an effective Vcs-Git-Refs field of refs/dgit/*, where I'm treating the glob as plural since it encompasses multiple refs. Similarly, statically specifying refs/heads/* could work too (albeit inefficiently). However neither of these would be necessary for my git-ubuntu case. I'm happy to use Vcs-Git-Ref if the feeling is that Vcs-Git-Refs is "YAGNI". > The spec should probably also say that the commit in question isn't > guaranteed to be reachable from that ref in the long term, but only by > the git-ubuntu importer in the short term (however exactly you define > that). Since the repository may be owned by the uploader, it doesn't > really seem practical to impose stronger lifetime constraints. Agreed - this was my intention. I will make it explicit. I'm not sure I can define "short term" well. I think I'd prefer to leave that undefined, treat it as best-effort, with a view to eventually replacing the entire mechanism with a push method as discussed below. If the ref isn't there for long enough, we fall back to the "no rich history found" position as below. > What happens if the repository is temporarily unreachable? > What happens if the repository is renamed before git-ubuntu gets to it? > What happens if the repository is private, as might be the case if the > upload is a security upload whose contents are embargoed before it hits > the archive? [elaboration clipped] For now, these cases are handled as "no rich history found" and a commit is synthesized instead. This is a graceful degradation that must exist anyway since there is currently no requirement that an uploader supply rich history at all. For "temporaily unreachable" I could add back-off and retry indeed, as an implementation detail. I don't intend to do this in my initial implementation, but it could be added later if it becomes a problem. In the meantime, for the initial implementation I see rich history adoption as a "bonus" rather than there being any kind of guarantee. If we wanted to guarantee it later by handling this failure case, but that wouldn't require a change to the spec, which is my main goal right now. However there still would be the "how long is long enough?" issue you identified above. "Renamed repository" I see as the equivalent to "deleted ref before the importer got to it" which, according to my spec, is the same as "didn't supply valid rich history". So it reduces to a requirement that the uploader "do it right", which I admit is not perfect, but something I hope that won't be a problem in practice and something that tooling can help with. The imperfection leads me to your pull vs. push discussion that I'll go into below. "Private repository" I also see as equivalent to "didn't supply valid rich history". It's a client-side problem to ensure that the repository is public at the time the importer sees it (which shouldn't be a problem in the case of an embargo because if the importer can see it, it's published publicly anyway). Importantly, these "didn't supply valid rich history" cases aren't an error, since the fallback is always to synthesize a suitable commit. My thoughts of getting rid of this are for far further down the line, discussed below. > dgit avoids all these problems by having a push model rather than a pull > model: "dgit push-source" pushes the appropriate commit to a specialized > git server, which can then make sure not to lose it. Now, at present in > Debian this has the side-effect of restricting the set of people who can > push, and it's certainly not entirely obvious how we would go about such > a thing in Ubuntu with Launchpad (I think we should avoid any design > that requires setting up another git server that needs to know about > developer identities and permissions etc.), but I think it's at least > worth thinking about before committing ourselves to a pull model. I see the pull model I'm proposing here as an intermediary destination. I'd like to implement this first, and look to switching to a push model later. > What would we need in Launchpad if we were going to try to do this on a > push basis? Brainstorming a bit, these are some approaches that came to > mind, bearing in mind that some of these ideas may be terrible: > > * Might it be practical to tell Launchpad to reserve some kind of token > corresponding to the commit in question guaranteeing that that commit > would be reachable until the token is consumed, which git-ubuntu > could then pass in the .changes file and the importer could consume? > > * Perhaps the upload could include a git bundle relative to some other > version already in the archive? (This could be large, though.) > > * Could we work out a way to allow any contributor to push to some kind > of holding area associated with the importer-owned repository, and > then the importer would only point a ref at that once the upload has > been processed? (I'm not sure how we would prevent an attacker from > being able to force such a repository to grow without bound, though.) > > * Could we use merge proposals for this somehow? An upload is in some > sense a proposal to merge some changes into the primary archive, and > I know "git ubuntu submit" already integrates with merge proposals on > an experimental basis. That might allow Launchpad to know that a > given commit is interesting and should be made available - indeed we > already have plans to expose virtual refs that correspond to merge > proposals, although I don't think those are quite done yet. If we > were to take this approach, then the ref that we point to could be > made to appear in the target repository instead, which avoids the > collection of issues around the source repository disappearing or > moving. Thank you for the brainstorming! This is the kind of thing I would like to achieve in the end, yes. Here's another alternative. I'm curious to hear what you think about this: * I'd like eventually for Launchpad to support "uploading" directly from git somehow, rather than round-tripping through dput. git-ubuntu provides a lossless bi-directional mapping between a git trees and Debian source packages[1]. Launchpad could use this to provide an API method that given a repository, commit hash and a ref to reach it from, could synthesize a dput upload as if it is signed by the person making the API call. This would not preclude uploaders from continuing to use dput, but would permit a more direct push method without having to send the "here's the rich history" metadata through a different route. > Of these, albeit with only half an hour's thought, I think my favourite > is the last one: using merge proposals feels quite elegant, and is > perhaps only a change in how your spec would be used rather than a > format-level change. I may have missed something, though. What do you > think? Relying on merge proposals would work perfectly well for my team, who have a peer review policy such that nearly all uploads go through git-ubuntu and merge proposals anyway. However, what about teams who do not do that? They would be forced to use merge proposals just to get git-ubuntu rich history preservation. Community flavor teams have limited developer resources, for example, and so are unlikely to get any other benefit from round tripping through merge proposals. We could arrange tooling so that it is no additional effort from an uploader's perspective, but it seems like an unnecessary hack to me to (ab)use merge proposals in this way. > > Notably there's a Dgit field defined by Debian Policy against dsc files, > > which is used for a very similar purpose[2]. > > I'm only a dgit user rather than an expert in its implementation, but I > believe that the Dgit field is used by dgit to retrospectively work out > the commit that represents a given source package version in the archive > as part of preparing a newer version that ought to be a descendant of > that commit, rather than as part of an upload instruction. That's why > it lives in the .dsc file rather than the .changes: after an upload has > been processed, the .changes is not stored in any authenticatable way. > But it's true that the current specification of Dgit explicitly relies > on the repository being at a well-known and persistent location. This is my understanding also. Thanks, Robie
signature.asc
Description: PGP signature
-- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel