Thanks Sean, this is a really helpful overview, and contains good guidance for 
new contributors to ML/MLLIB.
My confusion was that the ML 2.2 roadmap critical features 
(https://issues.apache.org/jira/browse/SPARK-18813) did not line up with the 
top ML/MLLIB JIRAs by Votes 
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Reopened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520BY%2520votes%2520DESC&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=%2FtFB0LY%2BIxLoEf%2FPr1i1%2FgvrjlpXPuYLSLbpnd89Tkg%3D&reserved=0>
 or 
Watchers<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Reopened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520BY%2520Watchers%2520DESC&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=XkPfFiB2T%2FoVnJcdr3jf12dQjes7w%2BVJMrbhgx3ELRs%3D&reserved=0>.
Your explanation that they do not have to and there is a more complex process 
to choosing the changes that will make it into the next release makes sense to 
me.
My only humble recommendation would be to cleanup the top JIRAs by closing the 
ones which have spark packages for them (eg the NN one which already has 
several packages as you explained), noting or somehow marking on some that they 
will not be resolved, and changing the component on the ones not related to 
ML/MLLIB (eg https://issues.apache.org/jira/browse/SPARK-12965).
Also, I would love to do this if I had the permissions, but it would be great 
to change the JIRAs that are marked as “in progress” but where the 
corresponding pull request was closed/cancelled, for example 
https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is actually one of 
the top ones by number of watches (adding kernels like Radial Basis Function to 
SVM, and I can imagine why it’s one of the top ones), and seeing it marked as 
in progress with a pull request is somewhat confusing.  I’ve seen several other 
JIRAs similar to this one, where the pull request was closed but the JIRA 
status was not updated – and if the pull request was closed for a good reason, 
the corresponding JIRA should probably be closed as well.
Thank you, Ilya


From: Sean Owen [mailto:so...@cloudera.com]
Sent: Tuesday, January 24, 2017 11:23 AM
To: Ilya Matiach <il...@microsoft.com>
Cc: dev@spark.apache.org
Subject: Re: Feedback on MLlib roadmap process proposal

On Tue, Jan 24, 2017 at 3:58 PM Ilya Matiach 
<il...@microsoft.com<mailto:il...@microsoft.com>> wrote:
Just a few questions with regards to the MLLIB process:


  1.  Is there a list of committers who can/are shepherds and what code they 
own?  I’ve seen this page: 
http://spark.apache.org/committers.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2Fcommitters.html&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=L6pZhfpFVoiAIUHXQjCP%2FhFZ3zINP4jhkYdiJPRQOj4%3D&reserved=0>
 but I’m not sure if it is up to date and it doesn’t mention what code the 
committers own.  It would be useful to know who owns ML or MLLIB.  From my 
limited personal experience this seems to be Joseph K. Bradley, Yanbo Liang and 
Sean Owen.
There is no such list because there's no formal notion of ownership or access 
to subsets of the project. Tracking an informal notion would be process mostly 
for its own sake, and probably just go out of date. We sort of tried this with 
'maintainers' and it didn't actually do anything.

I am not active much in ML, but will occasionally help commit simple changes. 
What you see organically is pretty much what is, at any given time. People you 
see responding are the active ones, and influencers, commit bit or no.



  1.
  2.  Based on both user votes and watchers, the top issue currently is 
“SPARK-5575: Artificial neural networks for MLlib deep learning”.  However, it 
looks like it has been opened for almost 2 years and not a lot of progress is 
being made.  There seem to be other top issues which aren’t getting addressed 
as well on these pages mentioned in the roadmap: MLlib, sorted by: Votes 
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Reopened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520BY%2520votes%2520DESC&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=%2FtFB0LY%2BIxLoEf%2FPr1i1%2FgvrjlpXPuYLSLbpnd89Tkg%3D&reserved=0>
 or Watchers 
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Reopened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520BY%2520Watchers%2520DESC&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=XkPfFiB2T%2FoVnJcdr3jf12dQjes7w%2BVJMrbhgx3ELRs%3D&reserved=0>
 .  Is my perception incorrect, or is there a very good reason for not 
addressing the top issues voted for by the community?  If there is a good 
reason, is there a way to filter such JIRAs out from the sorted lists, to know 
which JIRAs really should be taken/worked on?
JIRA votes and watchers don't mean anything, formally. This isn't a product 
company where one group might give another group a list of top priorities to 
work on. There's a general statement about this at 
http://spark.apache.org/contributing.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2Fcontributing.html&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=0nK2BOC49dlx74fwkS0ihbJFotccBKG7TS2Z3Q4TNvs%3D&reserved=0>
 under "Code Review Criteria". In practice, it's a soft process of convincing 
other people that change X does more good than harm, is worth taking the burden 
of supporting, matters to users, etc. I ignore 80% of issues, that don't seem 
to fit these criteria, and choose to help with the 20% that do, which are 
usually simple and/or important bug fixes.

ANNs? that's a tangent but my snap reaction are:
It's something Everybody wants Somebody Else to create, which may explain the 
votes vs activity?
There is one basic ANN implementation in Spark actually.
There are others outside Spark, so may be something people get elsewhere like 
dl4j or BigDL, or strapping TF to Spark in various ways.
DL is also not an obviously-great fit for the data-parallel computation model 
here.
It's not a goal to implement everything in Spark. It could be a good idea, but, 
no need to tether it to the core project, to the exclusion of "unblessed" 
third-party packages.



  1.
  2.  Also, this might be a newbie question, but for new contributors to spark, 
is there a process to convince a committer to be assigned to a JIRA that we are 
working on. It would be useful if there was a clear threshold for whether a 
committer can reject to work on a JIRA ahead of time, so contributors won’t 
waste time working on issues that aren’t important to spark and focus on making 
progress on the issues that the spark committers would like us to fix.

No, there's no concept of being tasked to work on something by someone else 
here. I can't imagine we could establish a clear objective threshold for such a 
subjective thing.

It's not a satisfying answer but it is the most realistic one. All of these OSS 
projects work on soft power, persuasion and cooperation. I think the good news 
is that all the intuitive ways to gain soft power do work: give time to others' 
problems if you want time on your own, help review, make thoughtful careful 
changes, etc.

My general guidance is: don't bother doing significant feature work unless you 
have some clear buy-in from someone who can commit.

I completely agree that issues should be closed more aggressively for the 
reason you give. On the flip-side this often ruffles feathers. We are still 
overrun with issues but it's gotten a lot better culture-wise about honestly 
rejecting lots of inbound stuff quickly.

Reply via email to