I'm moving this all to common-dev@ as general is more where announcements
go than anything else.

I think too many patches are falling by the wayside. It takes a lot of time
and effort to get patches in, small patches that aren't viewed as critical
tend to atrophy: lost in "patch-availalble" state, nobody tracking their
status, just someone left feeling neglected. Yet many of those patches help
the codebase: fix minor issues, improve the readability of the code, keep
that code in sync with changing dependencies (e.g. an evolving guava). Just
chasing down @Deprecated warnings would be a start.

To a committer, those small patches are work: there's still the overhead of
looking at them, and the manual effort of the merge which is constant for
all patches (update/reset your branch, apply, test, CHANGES.TXT, commit,
cherry-pick, push). Unless you view them as critical its just too much
effort for the minor stuff.

And as a developer of the small things, you get the other side of the work:
having to rebase and track trunk, resubmit, wait, maybe get some feedback,
otherwise: silence.

Even I have lots of minor patches being neglected, looking at those I see
one, HADOOP-6221 (*) Make it possible to interrupt RPC Client operations,
dates from 2009. That's five years old, impacts production —and still
nobody looks at it. And if a committer can't get patches in after 5 years,
what chance do others have?

Here are some thoughts of mine

*Technology*
-could something like Gerrit help?
-could we do more with Git pull requests? One issue here is that none of us
want the completely history of a patch development merged in, only an
aggregate (squashed) patch.
-How can testing be improved? That's precommit and postcommit regression
testing

*Codebase*

Culling hadoop-contrib was a good thing: it was where code went to die. But
we now have things that are considered to matter, yet under-reviewed.
Object stores are a key example: nobody full time works on those,
especially s3 and openstack —and everyone forgts that jenkins doesn't run
their tests (which add time and cost money). Which is how a release of
hadoop got out (2.4) whose S3n client would NPE on a seek(0) of a 0-byte
file. What bits like this are considered to matter, yet don't get much
attention?

Off-hand, I'd say: launcher scripts & metrics; There's the perennial
'update the dependency' work (HADOOP-9991) which is painful, and somewhat
risky. Which is why we tend to stick with dependencies we know, even if we
know they are outdated and have problems (jetty is many generations out of
date, for example)


*Process*
 -How can we clear the backlog?
 -How do we keep it lean in future?
 -what's the best way of developing "housekeeping" patches, ones that apply
the same fixes everywhere (e.g moving off some deprecated Guava API). These
patches age fast, yet nobody is going to want to review & process the
changes one file at a time.

So: discuss away.

In the meantime, I am planning to review and where possible commit patches
from people who are not committers but have submissions which matter to
them, with the goal being to +1 in stuff that has been out there a
while.(**)


-Steve


(*) https://issues.apache.org/jira/browse/HADOOP-6221 did get in +1'd
earlier today, but there are many more in that state, even just from me.

(**) If you have had >1 thing committed recently please don't ask for more,
I am trying to get more people's work in.


On 26 January 2015 at 19:55, Andrew Wang <andrew.w...@cloudera.com> wrote:

> Let's move this over to common-dev@, general@ is normally used for project
> announcements rather than discussion topics.
>
> I'd like to summarize a few things mentioned on the private@ thread,
> related to streamlining the code submission process.
>
> - Gerrit was brought up again, as it has in the past, as something that
> could make the actual process of reviewing and committing a lot easier.
> This would be especially helpful for small patches, where the mechanics of
> committing can take longer than reviewing the patch.
> - There were also concerns about forking discussions between JIRA and
> Gerrit. This has been an issue in Spark, and we'd like to keep discussions
> and issue tracking centralized.
>
> - Some talk about how to improve precommit. Right now it takes hours to run
> the unit tests, which slows down patch iterations. One solution is running
> tests in parallel (and even distributed). Previous distributed experiments
> have done a full unit test run in a couple minutes, but it'd be a fair
> amount of work to actually make this production ready.
> - Also mention of putting in place more linting and static analysis.
> Automating this will save reviewer time.
>
> Best,
> Andrew
>
> On Mon, Jan 26, 2015 at 9:16 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > In some cases, contributor responded to review comments and attached
> > patches addressing the comments.
> >
> > Later on, there was simply no response to the latest patch - even with
> > follow-on ping.
> >
> > I wish this aspect can be improved.
> >
> > Cheers
> >
> > On Sun, Jan 25, 2015 at 6:03 PM, Tsz Wo (Nicholas), Sze <
> > s29752-hadoopgene...@yahoo.com.invalid> wrote:
> >
> > > Hi contributors,
> > > I would like to (re)start a discussion regrading to our patch review
> > > process.  A similar discussion has been happened in a the hadoop
> private
> > > mailing list, which is inappropriate.
> > > Here is the problem:The patch available queues become longer and
> longer.
> > > It seems that we never can catch up.  There are patches sitting in the
> > > queues for years.  How could we speed up?
> > > Regrads,Tsz-Wo
> > >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to