Sean has given a great explanation. A few more comments: Roadmap: I have been creating roadmap JIRAs, but the goal really is to have all committers working on MLlib help to set that roadmap, based on either their knowledge of current maintenance/internal needs of the project or the feedback given from the rest of the community. @Committers - I see people actively shepherding PRs for MLlib, but I don't see many major initiatives linked to the roadmap. If there are ones large enough to merit adding to the roadmap, please do.
In general, there are many process improvements we could make. A few in my mind are: * Visibility: Let the community know what committers are focusing on. This was the primary purpose of the "MLlib roadmap proposal." * Community initiatives: This is currently very organic. Some of the organic process could be improved, such as encouraging Votes/Watchers (though I agree with Sean about these being one-sided metrics). Cody's SIP work is a great step towards adding more clarity and structure for major initiatives. * JIRA hygiene: Always a challenge, and always requires some manual prodding. But it's great to push for efforts on this. On Wed, Jan 25, 2017 at 3:59 AM, Sean Owen <so...@cloudera.com> wrote: > On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach <il...@microsoft.com> wrote: > >> My confusion was that the ML 2.2 roadmap critical features ( >> https://issues.apache.org/jira/browse/SPARK-18813) did not line up with >> the top ML/MLLIB JIRAs by Votes >> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Reopened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520BY%2520votes%2520DESC&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=%2FtFB0LY%2BIxLoEf%2FPr1i1%2FgvrjlpXPuYLSLbpnd89Tkg%3D&reserved=0>or >> Watchers >> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Reopened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520BY%2520Watchers%2520DESC&data=02%7C01%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&sdata=XkPfFiB2T%2FoVnJcdr3jf12dQjes7w%2BVJMrbhgx3ELRs%3D&reserved=0> >> . >> >> Your explanation that they do not have to and there is a more complex >> process to choosing the changes that will make it into the next release >> makes sense to me. >> > > For Spark ML, Joseph is the de facto leader and does publish a tentative > roadmap. (We could also use JIRA mechanisms for this but any scheme is > better than none.) Yes, not based on Votes -- nothing here is. Votes are > noisy signal because it is usually measures: what would you like done if > you didn't have to do it and there were no downsides for you? > > > >> My only humble recommendation would be to cleanup the top JIRAs by >> closing the ones which have spark packages for them (eg the NN one which >> already has several packages as you explained), noting or somehow marking >> on some that they will not be resolved, and changing the component on the >> ones not related to ML/MLLIB (eg https://issues.apache.org/ >> jira/browse/SPARK-12965). >> > > We do that. It occasionally generates protests, so, I find myself erring > on the side of ignoring. You can comment on any JIRA you think should be > closed. That's helpful. > > That particular JIRA seems potentially legitimate. I wouldn't close it. It > also won't get fixed until someone proposes a resolution. I'd strongly > encourage people saying "I have this problem too" to try to fix it. I tend > to ignore these otherwise, myself, in favor of reviewing ones where someone > has gone to the trouble of proposing a working fix. > > > >> Also, I would love to do this if I had the permissions, but it would be >> great to change the JIRAs that are marked as “in progress” but where the >> corresponding pull request was closed/cancelled, for example >> https://issues.apache.org/jira/browse/SPARK-4638. That JIRA is >> > > Yes, flag these. I or others can close them if appropriate. Anyone who > consistently does this well, we could give JIRA permissions to. > > Opening a PR automatically makes it "In Progress" but there's no > complementary process to un-mark it. You can ignore the Open / In Progress > distinction really. > > This one is interesting because it does seem like a plausible feature to > add. The original PR was abandoned by the author and nobody else submitted > one -- despite the Votes. I hesitate to signal that no PRs would be > considered, but, doesn't seem like it's in demand enough for someone to > work on? > > > I think one of my messages is that, de facto, here, like in many Apache > projects, committers do not take requests. They pursue the work they > believe needs doing, and shepherd work initiated by others (a clear bug > report, a PR) to a resolution. Things get done by doing them, or by > building influence by doing other things the project needs doing. It isn't > a mechanical, objective process, and can't be. But it does work in a > recognizable way. > >> -- Joseph Bradley Software Engineer - Machine Learning Databricks, Inc. [image: http://databricks.com] <http://databricks.com/>