I think the proposal laid out in SPARK-18813 is well done, and I do think
it is going to improve the process going forward. I also really like the
idea of getting the community to vote on JIRAs to give some of them
priority - provided that we listen to those votes, of course. The biggest
problem I see is that we do have several active contributors and those who
want to help implement these changes, but PRs are reviewed rather
sporadically and I imagine it is very difficult for contributors to
understand why some get reviewed and some do not. The most important thing
we can do, given that MLlib currently has a very limited committer review
bandwidth, is to make clear issues that, if worked on, will definitely get
reviewed. A hard thing to do in open source, no doubt, but even if we have
to limit the scope of such issues to a very small subset, it's a gain for
all I think.

On a related note, I would love to hear some discussion on the higher level
goal of Spark MLlib (if this derails the original discussion, please let me
know and we can discuss in another thread). The roadmap does contain
specific items that help to convey some of this (ML parity with MLlib,
model persistence, etc...), but I'm interested in what the "mission" of
Spark MLlib is. We often see PRs for brand new algorithms which are
sometimes rejected and sometimes not. Do we aim to keep implementing more
and more algorithms? Or is our focus really, now that we have a reasonable
library of algorithms, to simply make the existing ones faster/better/more
robust? Should we aim to make interfaces that are easily extended for
developers to easily implement their own custom code (e.g. custom
optimization libraries), or do we want to restrict things to out-of-the box
algorithms? Should we focus on more flexible, general abstractions like
distributed linear algebra?

I was not involved in the project in the early days of MLlib when this
discussion may have happened, but I think it would be useful to either
revisit it or restate it here for some of the newer developers.

On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <jos...@databricks.com>
wrote:

> Hi all,
>
> This is a general call for thoughts about the process for the MLlib
> roadmap proposed in SPARK-18813.  See the section called "Roadmap process."
>
> Summary:
> * This process is about committers indicating intention to shepherd and
> review.
> * The goal is to improve visibility and communication.
> * This is fairly orthogonal to the SIP discussion since this proposal is
> more about setting release targets than about proposing future plans.
>
> Thanks!
> Joseph
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>

Reply via email to