On Mon, Jul 13, 2015 at 7:24 PM, Jochen Wiedmann <jochen.wiedm...@gmail.com> wrote: > Hi, > > I am writing as one of the Mentors of the AsterixDB podling. > > It recently came to my attention, that there are, in fact, multiple > Git repositories, which are used by the project, one of them being > located externally of the ASF. I understand the structure to be like > this: >
This is a severe problem and needs to be rectified promptly. How are commits migrating from the external repository to the ASF repository? We typically end up missing provenance information that is important in blind mirroring of content. > +--------------+ Commits +------------------+ Mirrrors > +----------------+ > | Gerrit | --------------> | Git (External) | -------------> > | Git (ASF) | > +-------------+ +------------------+ > +----------------+ > > The structure is made like this, because the project members desire > that no commits can enter without a review, which is done in Gerrit. [2] > (In the past, this was ensured by a commit hook in the external > repository. That commit hook possibly still exists, but it doesn't > prevent > code to enter the ASF repository directly without a review. This lack > of security is currently discussed by the podlings project members.) > > I understand the desire, and, to me, it makes sense. OTOH, I suspect > that this issue might affect a successful incubation. Hence this mail. > Agreed. This needs to be rectified rapidly. > As Git is slowly gaining ground within the ASF, I'd suggest that a > possible resolution might be to have a Gerrit instance within the ASF. > Given how Github pull requests are already discussed by many projects, > I can imagine that many projects would like to adopt a similar policy. > Git is very widely used - slightly over 1/2 of the active projects at the ASF are using it now as their primary VCS. That said, we've explored gerrit a number of times, most recently in December. Just for frame of reference, I was very much in favor of Gerrit. I thought that there were a number of projects who also wanted it - but many of those changed their mind over time. In the end we discovered that there are a number of challenges: First, Gerrit wants what is best described as exclusive access to git repositories. It tends to want them on a local filesystem, and essentially acting as gatekeeper for commits. This isn't inherently a problem if you have all repos treated this way. But since we don't have all projects wanting Second: Gerrit wants every patch author authenticated against a common authn backend. This would mean folks would need accounts in LDAP. When we explored this last our LDAP infrastructure was incapable of what would have been an explosive growth in number of accounts and authentication requests. We've since made the infrastructure much more robust and resilient, but deploying gerrit would essentially require us to have a self-service account creation service, and that's a lot of work. At the moment, Infrastructure doesn't see enough demand from projects requesting Gerrit to make the tremendous investment required. We've also noticed a number of trends in projects who were interested in this: The oldest strategy is from Hadoop, and they have every patch submitted to Jira. Every patch is automatically detected, and has a pre-commit test job run, with Jenkins reporting to Jira the results of the tests. We have Reviewboard[0], which is gerrit-like, without the problems listed above. More recently, we have folks making heavy use of github pull requests. and there are two primary technologies that are being seen there. 1. Github pull request builder: Jenkins watches for pull requests against the GH mirror of the repo, and automatically picks up the job, and then reports the success or failure of that job in the pull request. 2. TravisCI - The ASF has a paid account with TravisCI and has 30 concurrent builders. Like the Github Pull Request Builder, it watches for pull requests against a repository and then runs tests, and then reports against the pull request. [2] Obviously, there's no automatic merge, or even technical enforcement. However, most projects are able to use social enforcement (and reverts if necessary) to ensure that folks aren't committing directly; and automatic merges would be disallowed anyway since a committer needs to make an explicit decision to commit. [0] http://reviews.apache.org [1] https://blogs.apache.org/infra/entry/github_pull_request_builds_now [2] https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci --David --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org