On Fri, Jul 6, 2018 at 4:34 AM Davyd McColl <dav...@gmail.com> wrote:
>
> I understand that git history will build over time -- I'm less concerned
> with (eventual) disk usage than I am with the speed of `emerge --sync`,
> which (and perhaps I'm sorely mistaken) appeared to be faster using git
> than rsync -- hence my choice of git over rsync (the discussion at
> https://forums.gentoo.org/viewtopic-t-1009562.html shows me to not be
> alone in this experience).
>

>From what I've generally seen/heard git is much more efficient as long
as you sync frequently.

rsync has the advantage that it only transfers the minimum necessary
to get you from the tree you have now to the tree that is current.  To
do this it has to stat every file (using default settings - you can
make it even slower if you want to), which is a lot of file I/O.

git has the advantage that it can just read the current HEAD and from
that know exactly what commits are missing, so there is way less
effort spent figuring out what changed.  It has the disadvantage that
it sends everything that happened since your last sync, which could
include files that were created and subsequently removed.  If you sync
often there won't be much of that, but if you're syncing monthly or
even less frequently then you probably will spend a lot of time
transmitting churn.

It is possible to trim down a repository, and as long as nobody is
doing force pushes on the main repo you should still be able to sync.
However, that is not something that just involves a git one-liner.
Personally I don't mind the space tradeoff, especially in exchange for
the IO tradeoff.  A sync is always a VERY fast operation.

I'll also note that the stable branch (which is always free of obvious
issues caused by devs not running repoman) is only available via git.
There is no reason that couldn't be replicated via rsync, but right
now we only have one set of mirrors.

I'm still syncing from github after enabling signature checking.
There is a patch that will make that more secure but in the meantime
my scripts keep an eye on exit status when I sync.  IMO signature
checking is more important than where you sync from - as long as gpg
says I'm good it really doesn't matter who has the ability to play
with the data enroute.  But, it certainly doesn't hurt to sync from
infra (I do have concerns for whether infra could handle everybody
doing it though - github is MS's problem to worry about).

-- 
Rich

Reply via email to