Hello,

My previous mail didn't focus on the most important thing, so I'd like
to start another thread with a simple question: do we need to provide
a user-friendly ${DISTDIR}/egit-src/?

Currently the repository stores consists of either bare or non-bare
clones of the remote repository. We do not support committing to those
local clones but people can easily clone them in order to obtain
a local development repository that can be used to work with the code
and push patches upstream.

However, supporting that increases the complexity of eclass
and decreases space efficiency. For example, if we started to do
shallow clones people would no longer be able to clone the repo
directly. We also need to worry about clone location collisions
and reusing the same location when multiple packages use the same repo.
As you can guess, git hostings don't make this easy on us.

The question would be: do you feel like we should really provide
a verbatim clone of upstream's repository? Or should we focus on
the eclass' main goal, that is fetching the remote sources in the most
bandwith and space-efficient manner?


If we decide to go for 'sane' clones, we need the eclass to be able to
provide sane paths for local copies. Those paths need to suit
the following points:

1. multiple remote repos (e.g. forks) may need to reuse the same local
   clone,

2. multiple packages may reuse the same repo and then they should
   create just one local clone,

3. a package may use multiple repos :),

4. submodules may reuse the same repo as other package, and then they
   should use the same local clone.

Honestly, I have no idea how to achieve that. The best idea that comes
to my mind is to use the whole 'path' part of the URI. That is, like:

  git://git.overlays.gentoo.org/proj/foo.git

would map to a path like:

  proj <something> foo.git

where <something> may be '/', '-', '_', '%2F', whatever.

This solves 2.-4. but won't help with 1. Plus the incoming bikeshed
about which character should be used, bikeshed that people really want
to override this and probably one more bikeshed. Oh, and some git
hostings put some prefix like '/git', '/p' or '/pub/scm/whatever' that
would be part of the checkout directory as well.

We could also supposedly use some unique identifier like root commit
identifier but I doubt users will like having hashes in egit-src.


An alternative is to create a semi-obfuscated yet space-efficient store
for all the repositories. That is, fetch *all* git repositories into
a single location.

Since git uses hashes to identify everything, this will work better
than you'd think first. Most importantly, we can avoid fetching
duplicates with no real effort since git simply reuses local objects
with the same ids.

This involves both duplicates in case of repos used by multiple
ebuilds, forked repos and identical files that are used by different
projects. I doubt you could make git more space efficient than that.

We no longer have to worry about EGIT_PROJECT, about submodules, about
bikesheds. However, the local store structure would no longer be
familiar to our users. We are basically switching from using git as VCS
to using git as efficient file fetching tool.

There's also some increased risk wrt hash collisions but I doubt that
should be considered a problem at the moment.


What are your thoughts?

-- 
Best regards,
Michał Górny

Attachment: signature.asc
Description: PGP signature

Reply via email to