Hi, On 6/13/24 20:29, Marco d'Itri wrote:
Do we actually want or need to hoard all the collaboration history?
Of course: this makes auditing much easier.
That is a *massive* amount of data though, especially if we're expected to import the entire upstream git history as well and base the packaging branch on top of an upstream commit.
We will also need to be prepared for removal requests, so there needs to be a procedure in place for that, people authorized to perform it, and an audit framework for that.
I don't think any additional auditing of upstream sources will be performed because of this either, they will just be pulled in and used as-is. We might get additional insights after a breach, perhaps, if Github decide to take a compromised repository offline and our copy is still accessible.
We could add some mechanisms, like enforcing that merge commits pulling in a new upstream version will only modify files outside of debian/ in one subtree, and files inside debian/ in the other, but that conflicts with workflows that maintain Debian-specific patches as commits instead of patch files.
Without such a mechanism, these merge commits would immediately become the most obvious place to hide malicious code in a large changeset.
We have several 90% solutions of mapping Debian packaging onto git, but all of these are incomplete and annoying to use because we disagree with git on what constitutes data, and what constitutes metadata, so the data model does not match reality or requirements, and from a security standpoint that concerns me more than improved forensics.
Simon