Hi Sachin,

I can understand your dissatisfaction with the review process of your ML
PRs. They are open for a long time without much activity from a committer
even though you've spent a lot of effort writing them. I'm sorry for this,
since it's probably mainly because of me lacking the time resources to
review them.

I agree with Stephan that reviewing ML algorithms is particularly time
consuming because you not only have to verify the parallel implementation
but first of all you have to understand the algorithm. This usually
requires reading the respective literature. Furthermore, a close
examination is often necessary because a single wrong sign renders the math
and also the algorithm directly wrong. Figuring these things out when it's
still fresh is usually easier.

I personally prefer not to rush things and rather merge only things which
are working correctly, ideally supported by exhaustive tests, and
efficiently implemented.

At the moment I think the biggest problem is that there are too few
committers working on ML. Currently, it's hard for me to allocate time for
it and I guess so for others too. It would be great if the review process
wouldn't solely depend on the committers, though. However, I don't see an
immediate solution for this problem. Therefore, I would ask you to not lose
patience with the community, even though this might not be a satisfying
answer for you.

Cheers,
Till

On Sun, Dec 6, 2015 at 8:59 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi Sachin!
>
> Thank you for honestly speaking your mind on this issue.
> I agree that handling ML pull requests is particularly challenging right
> now for various reasons.
>
> There are currently very few active committers that have the background to
> actually review these pull requests. The whole work basically hinges on
> Till and Chiwan. As far as I know, both of them work on many things, and
> have only a certain time they can spend on ML. With regard to ML, Flink is
> simply a bit understaffed.
>
> The ML parts hard hard to develop. They require people who understand math,
> know how to write good code, and can think deeply into the parallel systems
> aspects of Flink. A lot of ML pull requests had to be review multiple times
> because they did not bring these aspects together.
>
> Finally, the ML pull requests are very time consuming to review, because
> the reviewer usually has to learn the implemented algorithm (or the
> parallel adaption) before being able to properly review it. As a result,
> while most other pull requests can be reviewed with a bit of time next to
> another weekly agenda, ML pull requests need a lot of dedicated time.
>
>
> The things I see that can really improve the situation are these:
>
> 1) Try to get more people in the community to help bring these pull
> requests into shape. While the group of ML-savvy committers is still small,
> it would be good if contributors could help each other out there are well,
> by reviewing the pull requests, and helping to get them into shape. Pull
> requests that are high quality by the time one of the committers looks over
> them can be merged relatively quickly.
>
> 2) Honestly reject some pull requests. Focusing the time on the promising
> ML pull requests and earlier rejecting pull requests that are far away from
> good shape. Giving the rejected pull requests some coarser feedback about
> what categories to improve, rather than detailed comments on each part.
> While it is nice to try and shepherd as many pull requests in as possible
> (and the community really tries to do that), it may not be possible in that
> area, as long as there is still not enough ML people.
>
> Maybe Till and Chiwan can share their thoughts on this.
>
>
> Greetings,
> Stephan
>
>
> On Sun, Dec 6, 2015 at 5:44 PM, Chiwan Park <chiwanp...@apache.org> wrote:
>
> > Hi Sachin,
> >
> > I’m sorry for your unsatisfied experience about lack of review.
> >
> > As you know, because there are few committers (including me) whose
> > interest is ML, the review of PRs could be completed slowly. But I admit
> my
> > fault about leaving the PRs unreviewed for 3-5 months.
> >
> > I’ll try to review your PRs as soon as possible.
> >
> > > On Dec 6, 2015, at 1:00 AM, Sachin Goel <sachingoel0...@gmail.com>
> > wrote:
> > >
> > > Hi all
> > > Sorry about a weekend email.
> > >
> > > This email is to express my displeasure over the lack of any review on
> my
> > > PRs on extending the ML library. Five of my PRs have been without any
> > > review for times varying from 3-5 months now.
> > > When I took up the task of extending the ML library by implementing
> core
> > > algorithms such as Decision Tree and k-means clustering [with several
> > > initialization schemes], I had hoped that the community will be
> actively
> > > involved in it since ML is a very important component of any big data
> > > system these days. However, it appears I have been wrong.
> > > Surely, the initial reviews required a lot of changes from my side over
> > > coding style mistakes [first time programmer in Scala], and
> > > less-than-optimal implementations. I like to think that I have learned
> a
> > > lot about maintaining better coding style compatible with Flink code
> > base,
> > > and spending time to optimize my work due to this.
> > > However, if a PR requires work, that doesn't automatically disqualify
> it
> > > from being reviewed actively, since the author has spent a lot of time
> on
> > > it and has voluntarily taken up the task of contributing.
> > >
> > > Machine learning is my core area of interest and I am able to
> contribute
> > > much more to the library; however, a lack of review after repeated
> > > reminders automatically discourages me from picking up more issues.
> > >
> > > However minor some of my commits maybe, I have been actively involved
> in
> > > the development work [with a total of 29 commits.]. I have also spent a
> > lot
> > > of time in release testing and diagnosing-slash-fixing lots of issues
> > with
> > > Web Dashboard. However, as with any contributor, my main goal is to
> > > contribute to my area of interest, while also diversifying my work by
> > > fixing other issues.
> > >
> > > The PRs are 710, 757, 861, 918 and 1032. I propose the following order
> > for
> > > anyone who wants to review my work:
> > > 1032 [very simple feature.]
> > > 918 [very short PR]
> > > 861 [followed by 710 after a complete rebase] [major work for
> Histograms
> > > and Decision Trees]
> > > 757 [major work for K-Means clustering and initialization schemes]
> > >
> > > If I have come across as rude, I apologize.
> > >
> > > Happy reviewing and thanks for bearing with me. :)
> > >
> > > Cheers!
> > > Sachin
> > >
> > > -- Sachin Goel
> > > Computer Science, IIT Delhi
> > > m. +91-9871457685
> >
> > Regards,
> > Chiwan Park
> >
> >
> >
> >
>

Reply via email to