Hello, My previous mail didn't focus on the most important thing, so I'd like to start another thread with a simple question: do we need to provide a user-friendly ${DISTDIR}/egit-src/?
Currently the repository stores consists of either bare or non-bare clones of the remote repository. We do not support committing to those local clones but people can easily clone them in order to obtain a local development repository that can be used to work with the code and push patches upstream. However, supporting that increases the complexity of eclass and decreases space efficiency. For example, if we started to do shallow clones people would no longer be able to clone the repo directly. We also need to worry about clone location collisions and reusing the same location when multiple packages use the same repo. As you can guess, git hostings don't make this easy on us. The question would be: do you feel like we should really provide a verbatim clone of upstream's repository? Or should we focus on the eclass' main goal, that is fetching the remote sources in the most bandwith and space-efficient manner? If we decide to go for 'sane' clones, we need the eclass to be able to provide sane paths for local copies. Those paths need to suit the following points: 1. multiple remote repos (e.g. forks) may need to reuse the same local clone, 2. multiple packages may reuse the same repo and then they should create just one local clone, 3. a package may use multiple repos :), 4. submodules may reuse the same repo as other package, and then they should use the same local clone. Honestly, I have no idea how to achieve that. The best idea that comes to my mind is to use the whole 'path' part of the URI. That is, like: git://git.overlays.gentoo.org/proj/foo.git would map to a path like: proj <something> foo.git where <something> may be '/', '-', '_', '%2F', whatever. This solves 2.-4. but won't help with 1. Plus the incoming bikeshed about which character should be used, bikeshed that people really want to override this and probably one more bikeshed. Oh, and some git hostings put some prefix like '/git', '/p' or '/pub/scm/whatever' that would be part of the checkout directory as well. We could also supposedly use some unique identifier like root commit identifier but I doubt users will like having hashes in egit-src. An alternative is to create a semi-obfuscated yet space-efficient store for all the repositories. That is, fetch *all* git repositories into a single location. Since git uses hashes to identify everything, this will work better than you'd think first. Most importantly, we can avoid fetching duplicates with no real effort since git simply reuses local objects with the same ids. This involves both duplicates in case of repos used by multiple ebuilds, forked repos and identical files that are used by different projects. I doubt you could make git more space efficient than that. We no longer have to worry about EGIT_PROJECT, about submodules, about bikesheds. However, the local store structure would no longer be familiar to our users. We are basically switching from using git as VCS to using git as efficient file fetching tool. There's also some increased risk wrt hash collisions but I doubt that should be considered a problem at the moment. What are your thoughts? -- Best regards, Michał Górny
signature.asc
Description: PGP signature