Re: Problem with ML pipeline

Sachin Goel Sun, 07 Jun 2015 22:12:07 -0700

A more general approach would be to take as input which indices of the
vector to consider as features. After that, the vector can be returned as
such and user can do what they  wish with the non-feature values. This
wouldn't need extending the predict operation, instead this can be
specified in the model itself using a set parameter function. Or perhaps a
better approach is to just take this input in the predict operation.


Cheers!
Sachin
On Jun 8, 2015 10:17 AM, "Felix Neutatz" <neut...@googlemail.com> wrote:

> Probably we also need it for the other classes of the pipeline as well, in
> order to be able to pass the ID through the whole pipeline.
>
> Best regards,
> Felix
>  Am 06.06.2015 9:46 vorm. schrieb "Till Rohrmann" <trohrm...@apache.org>:
>
> > Then you only have to provide an implicit PredictOperation[SVM, (T, Int),
> > (LabeledVector, Int)] value with T <: Vector in the scope where you call
> > the predict operation.
> > On Jun 6, 2015 8:14 AM, "Felix Neutatz" <neut...@googlemail.com> wrote:
> >
> > > That would be great. I like the special predict operation better
> because
> > it
> > > is only in some cases necessary to return the id. The special predict
> > > Operation would save this overhead.
> > >
> > > Best regards,
> > > Felix
> > > Am 04.06.2015 7:56 nachm. schrieb "Till Rohrmann" <
> > till.rohrm...@gmail.com
> > > >:
> > >
> > > > I see your problem. One way to solve the problem is to implement a
> > > special
> > > > PredictOperation which takes a tuple (id, vector) and returns a tuple
> > > (id,
> > > > labeledVector). You can take a look at the implementation for the
> > vector
> > > > prediction operation.
> > > >
> > > > But we can also discuss about adding an ID field to the Vector type.
> > > >
> > > > Cheers,
> > > > Till
> > > > On Jun 4, 2015 7:30 PM, "Felix Neutatz" <neut...@googlemail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have the following use case: I want to to regression for a
> > timeseries
> > > > > dataset like:
> > > > >
> > > > > id, x1, x2, ..., xn, y
> > > > >
> > > > > id = point in time
> > > > > x = features
> > > > > y = target value
> > > > >
> > > > > In the Flink frame work I would map this to a LabeledVector (y,
> > > > > DenseVector(x)). (I don't want to use the id as a feature)
> > > > >
> > > > > When I apply finally the predict() method I get a LabeledVector
> > > > > (y_predicted, DenseVector(x)).
> > > > >
> > > > > Now my problem is that I would like to plot the predicted target
> > value
> > > > > according to its time.
> > > > >
> > > > > What I have to do now is:
> > > > >
> > > > > a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
> > > > > b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))
> > > > >
> > > > > a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)
> > > > >
> > > > > This is really a cumbersome process for such an simple thing. Is
> > there
> > > > any
> > > > > approach which makes this more simple. If not, can we extend the ML
> > > API.
> > > > to
> > > > > allow ids?
> > > > >
> > > > > Best regards,
> > > > > Felix
> > > > >
> > > >
> > >
> >
>

Re: Problem with ML pipeline

Reply via email to