Re: Adding abstraction in MLlib

2014-09-16 Thread Xiangrui Meng
Hi Egor, I posted the design doc for pipeline and parameters on the JIRA, now I'm trying to work out some details of ML datasets, which I will post it later this week. You feedback is welcome! Best, Xiangrui On Mon, Sep 15, 2014 at 12:44 AM, Reynold Xin wrote: > Hi Egor, > > Thanks for the sugg

Re: Adding abstraction in MLlib

2014-09-15 Thread Reynold Xin
Hi Egor, Thanks for the suggestion. It is definitely our intention and practice to post design docs as soon as they are ready, and short iteration cycles. As a matter of fact, we encourage design docs for major features posted before implementation starts, and WIP pull requests before they are ful

Re: Adding abstraction in MLlib

2014-09-14 Thread Egor Pahomov
It's good, that databricks working on this issue! However current process of working on that is not very clear for outsider. - Last update on this ticket is August 5. If all this time was active development, I have concerns that without feedback from community for such long time developme

Re: Adding abstraction in MLlib

2014-09-12 Thread Patrick Wendell
We typically post design docs on JIRA's before major work starts. For instance, pretty sure SPARk-1856 will have a design doc posted shortly. On Fri, Sep 12, 2014 at 12:10 PM, Erik Erlandson wrote: > > Are interface designs being captured anywhere as documents that the community > can follow alo

Re: Adding abstraction in MLlib

2014-09-12 Thread Erik Erlandson
Are interface designs being captured anywhere as documents that the community can follow along with as the proposals evolve? I've worked on other open source projects where design docs were published as "living documents" (e.g. on google docs, or etherpad, but the particular mechanism isn't cr

Re: Adding abstraction in MLlib

2014-09-12 Thread Xiangrui Meng
Hi Egor, Thanks for the feedback! We are aware of some of the issues you mentioned and there are JIRAs created for them. Specifically, I'm pushing out the design on pipeline features and algorithm/model parameters this week. We can move our discussion to https://issues.apache.org/jira/browse/SPARK

Re: Adding abstraction in MLlib

2014-09-12 Thread Reynold Xin
Xiangrui can comment more, but I believe Joseph and him are actually working on standardize interface and pipeline feature for 1.2 release. On Fri, Sep 12, 2014 at 8:20 AM, Egor Pahomov wrote: > Some architect suggestions on this matter - > https://github.com/apache/spark/pull/2371 > > 2014-09-1

Re: Adding abstraction in MLlib

2014-09-12 Thread Egor Pahomov
Some architect suggestions on this matter - https://github.com/apache/spark/pull/2371 2014-09-12 16:38 GMT+04:00 Egor Pahomov : > Sorry, I misswrote - I meant learners part of framework - models already > exists. > > 2014-09-12 15:53 GMT+04:00 Christoph Sawade < > christoph.saw...@googlemail.com

Re: Adding abstraction in MLlib

2014-09-12 Thread Egor Pahomov
Sorry, I misswrote - I meant learners part of framework - models already exists. 2014-09-12 15:53 GMT+04:00 Christoph Sawade : > I totally agree, and we discovered also some drawbacks with the > classification models implementation that are based on GLMs: > > - There is no distinction between pr

Re: Adding abstraction in MLlib

2014-09-12 Thread Christoph Sawade
I totally agree, and we discovered also some drawbacks with the classification models implementation that are based on GLMs: - There is no distinction between predicting scores, classes, and calibrated scores (probabilities). For these models it is common to have access to all of them and the pred

Adding abstraction in MLlib

2014-09-12 Thread Egor Pahomov
Here in Yandex, during implementation of gradient boosting in spark and creating our ML tool for internal use, we found next serious problems in MLLib: - There is no Regression/Classification model abstraction. We were building abstract data processing pipelines, which should work just with