Hi Egor,
I posted the design doc for pipeline and parameters on the JIRA, now
I'm trying to work out some details of ML datasets, which I will post
it later this week. You feedback is welcome!
Best,
Xiangrui
On Mon, Sep 15, 2014 at 12:44 AM, Reynold Xin wrote:
> Hi Egor,
>
> Thanks for the sugg
Hi Egor,
Thanks for the suggestion. It is definitely our intention and practice to
post design docs as soon as they are ready, and short iteration cycles. As
a matter of fact, we encourage design docs for major features posted before
implementation starts, and WIP pull requests before they are ful
It's good, that databricks working on this issue! However current process
of working on that is not very clear for outsider.
- Last update on this ticket is August 5. If all this time was active
development, I have concerns that without feedback from community for such
long time developme
We typically post design docs on JIRA's before major work starts. For
instance, pretty sure SPARk-1856 will have a design doc posted
shortly.
On Fri, Sep 12, 2014 at 12:10 PM, Erik Erlandson wrote:
>
> Are interface designs being captured anywhere as documents that the community
> can follow alo
Are interface designs being captured anywhere as documents that the community
can follow along with as the proposals evolve?
I've worked on other open source projects where design docs were published as
"living documents" (e.g. on google docs, or etherpad, but the particular
mechanism isn't cr
Hi Egor,
Thanks for the feedback! We are aware of some of the issues you
mentioned and there are JIRAs created for them. Specifically, I'm
pushing out the design on pipeline features and algorithm/model
parameters this week. We can move our discussion to
https://issues.apache.org/jira/browse/SPARK
Xiangrui can comment more, but I believe Joseph and him are actually
working on standardize interface and pipeline feature for 1.2 release.
On Fri, Sep 12, 2014 at 8:20 AM, Egor Pahomov
wrote:
> Some architect suggestions on this matter -
> https://github.com/apache/spark/pull/2371
>
> 2014-09-1
Some architect suggestions on this matter -
https://github.com/apache/spark/pull/2371
2014-09-12 16:38 GMT+04:00 Egor Pahomov :
> Sorry, I misswrote - I meant learners part of framework - models already
> exists.
>
> 2014-09-12 15:53 GMT+04:00 Christoph Sawade <
> christoph.saw...@googlemail.com
Sorry, I misswrote - I meant learners part of framework - models already
exists.
2014-09-12 15:53 GMT+04:00 Christoph Sawade :
> I totally agree, and we discovered also some drawbacks with the
> classification models implementation that are based on GLMs:
>
> - There is no distinction between pr
I totally agree, and we discovered also some drawbacks with the
classification models implementation that are based on GLMs:
- There is no distinction between predicting scores, classes, and
calibrated scores (probabilities). For these models it is common to have
access to all of them and the pred
Here in Yandex, during implementation of gradient boosting in spark and
creating our ML tool for internal use, we found next serious problems in
MLLib:
- There is no Regression/Classification model abstraction. We were
building abstract data processing pipelines, which should work just with
11 matches
Mail list logo