On Tue, Feb 23, 2016 at 7:50 PM, Kristian Fiskerstrand <k...@gentoo.org> wrote:
>
> On 02/24/2016 01:33 AM, Duncan wrote:
>>
>> IMO, what's actually happening here is the slow deprecation of
>> rsync mirrors in favor of git.  I doubt they'd be created at all
>> if gentoo were
>
> I don't agree to this at all. For one thing git is very resource
> intensive compared to rsync mirroring,

Is this actually true?  For the typical use case of daily or close to
daily updates I'd think that git would be much more efficient.

rsync has to traverse an entire directory tree (both client and
server-side, though of course either could have it cached) and
synchronize across the network the metadata for every file to
determine what has changed, and then figure out what changed in each
file and transfer it.  With a large git repository with only a few
hundred new commits the client just tells the server what its last
commit is, the server walks back in history to find it, and then the
server can quickly identify all the new commits/trees/blobs and send
just those.  With the COW design of git this is very efficient, not
requiring traversing any subdirectory in which no files have changed.

In the degenerate case where nothing has changed, an rsync still needs
to walk the full tree and send a file list, while git just sends a
commit ID and terminates.

Now, for an infrequent sync (think months) where most of the tree has
changed I could certainly buy that a webrsync would be far more
efficient for everybody.

And just like rsync git is easy to mirror, with github being an
example of a service that will mirror anybody's repo for free and they
seem to have no end to their bandwidth (though I've found that pushing
a full historical gentoo git tree to them does make them choke on it
for about 30min before it shows up).

So, while I'll agree with the validity of your other points, I'd be
interested in actual data to back up the resource claim.  I could see
that going either way, and that is likely to be based on how
well-optimized everything is.  Linus did a pretty good job with git.

> For one thing we can't expect users to keep an up
> to date copy of all gentoo developer's OpenPGP keys to verify each git
> commit, additionally this will cause issues with retirement and
> similar situations (certificate revocation, subkey rotations, expiries).

Well, we could do something (eventually) to make tracking keys easier,
but I'll still buy that the thick manifests are more secure.  Git
commit signatures are only bound to their contents with sha1.  I get
that nobody has demonstrated a practical attack on that, but I think
most crypto experts wouldn't heartily endorse the design.

Keep in mind that we do have git mirrors that include metadata/etc
hosted on Github.  I know people have concerns with their software
being proprietary but as far as syncing goes it is just a mirror.  I
doubt most of us audit all the distfiles mirrors we use to make sure
they're only using FOSS ftp/http servers and so on.  There really
isn't any reason that it couldn't be hosted on infra either, assuming
they wanted the extra load (and I don't see the point in it, since it
is just a mirror, and if it ever goes away it is trivial to just point
the scripts that generate it to push to some other mirror instead -
git itself is completely FOSS).

Again, I have nothing against devs maintaining rsync and changelogs,
and users making use of them.  I just don't see it as the end of the
world if devs decide to stop taking care of them.

-- 
Rich

Reply via email to