I'm moving this all to common-dev@ as general is more where announcements go than anything else.
I think too many patches are falling by the wayside. It takes a lot of time and effort to get patches in, small patches that aren't viewed as critical tend to atrophy: lost in "patch-availalble" state, nobody tracking their status, just someone left feeling neglected. Yet many of those patches help the codebase: fix minor issues, improve the readability of the code, keep that code in sync with changing dependencies (e.g. an evolving guava). Just chasing down @Deprecated warnings would be a start. To a committer, those small patches are work: there's still the overhead of looking at them, and the manual effort of the merge which is constant for all patches (update/reset your branch, apply, test, CHANGES.TXT, commit, cherry-pick, push). Unless you view them as critical its just too much effort for the minor stuff. And as a developer of the small things, you get the other side of the work: having to rebase and track trunk, resubmit, wait, maybe get some feedback, otherwise: silence. Even I have lots of minor patches being neglected, looking at those I see one, HADOOP-6221 (*) Make it possible to interrupt RPC Client operations, dates from 2009. That's five years old, impacts production —and still nobody looks at it. And if a committer can't get patches in after 5 years, what chance do others have? Here are some thoughts of mine *Technology* -could something like Gerrit help? -could we do more with Git pull requests? One issue here is that none of us want the completely history of a patch development merged in, only an aggregate (squashed) patch. -How can testing be improved? That's precommit and postcommit regression testing *Codebase* Culling hadoop-contrib was a good thing: it was where code went to die. But we now have things that are considered to matter, yet under-reviewed. Object stores are a key example: nobody full time works on those, especially s3 and openstack —and everyone forgts that jenkins doesn't run their tests (which add time and cost money). Which is how a release of hadoop got out (2.4) whose S3n client would NPE on a seek(0) of a 0-byte file. What bits like this are considered to matter, yet don't get much attention? Off-hand, I'd say: launcher scripts & metrics; There's the perennial 'update the dependency' work (HADOOP-9991) which is painful, and somewhat risky. Which is why we tend to stick with dependencies we know, even if we know they are outdated and have problems (jetty is many generations out of date, for example) *Process* -How can we clear the backlog? -How do we keep it lean in future? -what's the best way of developing "housekeeping" patches, ones that apply the same fixes everywhere (e.g moving off some deprecated Guava API). These patches age fast, yet nobody is going to want to review & process the changes one file at a time. So: discuss away. In the meantime, I am planning to review and where possible commit patches from people who are not committers but have submissions which matter to them, with the goal being to +1 in stuff that has been out there a while.(**) -Steve (*) https://issues.apache.org/jira/browse/HADOOP-6221 did get in +1'd earlier today, but there are many more in that state, even just from me. (**) If you have had >1 thing committed recently please don't ask for more, I am trying to get more people's work in. On 26 January 2015 at 19:55, Andrew Wang <andrew.w...@cloudera.com> wrote: > Let's move this over to common-dev@, general@ is normally used for project > announcements rather than discussion topics. > > I'd like to summarize a few things mentioned on the private@ thread, > related to streamlining the code submission process. > > - Gerrit was brought up again, as it has in the past, as something that > could make the actual process of reviewing and committing a lot easier. > This would be especially helpful for small patches, where the mechanics of > committing can take longer than reviewing the patch. > - There were also concerns about forking discussions between JIRA and > Gerrit. This has been an issue in Spark, and we'd like to keep discussions > and issue tracking centralized. > > - Some talk about how to improve precommit. Right now it takes hours to run > the unit tests, which slows down patch iterations. One solution is running > tests in parallel (and even distributed). Previous distributed experiments > have done a full unit test run in a couple minutes, but it'd be a fair > amount of work to actually make this production ready. > - Also mention of putting in place more linting and static analysis. > Automating this will save reviewer time. > > Best, > Andrew > > On Mon, Jan 26, 2015 at 9:16 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > In some cases, contributor responded to review comments and attached > > patches addressing the comments. > > > > Later on, there was simply no response to the latest patch - even with > > follow-on ping. > > > > I wish this aspect can be improved. > > > > Cheers > > > > On Sun, Jan 25, 2015 at 6:03 PM, Tsz Wo (Nicholas), Sze < > > s29752-hadoopgene...@yahoo.com.invalid> wrote: > > > > > Hi contributors, > > > I would like to (re)start a discussion regrading to our patch review > > > process. A similar discussion has been happened in a the hadoop > private > > > mailing list, which is inappropriate. > > > Here is the problem:The patch available queues become longer and > longer. > > > It seems that we never can catch up. There are patches sitting in the > > > queues for years. How could we speed up? > > > Regrads,Tsz-Wo > > > > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.