Re: [mllib] State of Multi-Model training

Burak Yavuz Wed, 17 Sep 2014 13:17:05 -0700

I believe it will be in the main repo.

Burak


----- Original Message -----
From: "Kyle Ellrott" <kellr...@soe.ucsc.edu>
To: "Burak Yavuz" <bya...@stanford.edu>
Cc: dev@spark.apache.org
Sent: Wednesday, September 17, 2014 9:48:54 AM
Subject: Re: [mllib] State of Multi-Model training

This sounds like a pretty major re-write of the system. Is it going to live
in an different repo during development? Or will we be able to track
progress in the main Spark repo?

Kyle

On Tue, Sep 16, 2014 at 10:22 PM, Burak Yavuz <bya...@stanford.edu> wrote:

> Hi Kyle,
>
> Thank you for the code examples. We may be able to use some of the ideas
> there. I think initially the goal is to have the optimizers ready (SGD,
> LBFGS),
> and then the evaluation metrics will come next. It might take some time,
> however as MLlib is going to have a significant API "face-lift" (e.g.
> https://issues.apache.org/jira/browse/SPARK-3530). Evaluation metrics
> will be significant in the new "pipeline"s and the ability to evaluate
> multiple models
> efficiently is very important. We encourage you to read through the design
> docs, and we would appreciate any feedback from you and the rest of the
> community!
>
> Best,
> Burak
>
> ----- Original Message -----
> From: "Kyle Ellrott" <kellr...@soe.ucsc.edu>
> To: "Burak Yavuz" <bya...@stanford.edu>
> Cc: dev@spark.apache.org
> Sent: Tuesday, September 16, 2014 9:41:45 PM
> Subject: Re: [mllib] State of Multi-Model training
>
> I'd be interested in helping to test your code as soon as its available.
> The version I wrote used a paired RDD and combined by key, it worked best
> if it used a custom partitioner that put all the samples in the same area.
> Running things in batched matrices would probably speed things up greatly.
> You probably won't need my training code, but I did write some stuff
> related to calculating Binary classifications metric (
> https://github.com/apache/spark/pull/1292/files#diff-6) and AUC (
> https://github.com/apache/spark/pull/1292/files#diff-5) for multiple
> models
> that you might be able to use.
>
> Kyle
>
>
> On Tue, Sep 16, 2014 at 4:09 PM, Burak Yavuz <bya...@stanford.edu> wrote:
>
> > Hi Kyle,
> >
> > I'm actively working on it now. It's pretty close to completion, I'm just
> > trying to figure out bottlenecks and optimize as much as possible.
> > As Phase 1, I implemented multi model training on Gradient Descent.
> > Instead of performing Vector-Vector operations on rows (examples) and
> > weights,
> > I've batched them into matrices so that we can use Level 3 BLAS to speed
> > things up. I've also added support for Sparse Matrices (
> > https://github.com/apache/spark/pull/2294) as making use of sparsity
> will
> > allow you to train more models at once.
> >
> > Best,
> > Burak
> >
> > ----- Original Message -----
> > From: "Kyle Ellrott" <kellr...@soe.ucsc.edu>
> > To: dev@spark.apache.org
> > Sent: Tuesday, September 16, 2014 3:21:53 PM
> > Subject: [mllib] State of Multi-Model training
> >
> > I'm curious about the state of development Multi-Model learning in MLlib
> > (training sets of models during the same training session, rather then
> one
> > at a time). The JIRA lists it as in progress targeting Spark 1.2.0 (
> > https://issues.apache.org/jira/browse/SPARK-1486 ). But there hasn't
> been
> > any notes on it in over a month.
> > I submitted a pull request for a possible method to do this work a little
> > over two months ago (https://github.com/apache/spark/pull/1292), but
> > haven't yet received any feedback on the patch yet.
> > Is anybody else working on multi-model training?
> >
> > Kyle
> >
> >
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [mllib] State of Multi-Model training

Reply via email to