Re: Problem with ML pipeline

Felix Neutatz Sun, 07 Jun 2015 21:47:53 -0700

Probably we also need it for the other classes of the pipeline as well, in
order to be able to pass the ID through the whole pipeline.


Best regards,
Felix
 Am 06.06.2015 9:46 vorm. schrieb "Till Rohrmann" <[email protected]>:

> Then you only have to provide an implicit PredictOperation[SVM, (T, Int),
> (LabeledVector, Int)] value with T <: Vector in the scope where you call
> the predict operation.
> On Jun 6, 2015 8:14 AM, "Felix Neutatz" <[email protected]> wrote:
>
> > That would be great. I like the special predict operation better because
> it
> > is only in some cases necessary to return the id. The special predict
> > Operation would save this overhead.
> >
> > Best regards,
> > Felix
> > Am 04.06.2015 7:56 nachm. schrieb "Till Rohrmann" <
> [email protected]
> > >:
> >
> > > I see your problem. One way to solve the problem is to implement a
> > special
> > > PredictOperation which takes a tuple (id, vector) and returns a tuple
> > (id,
> > > labeledVector). You can take a look at the implementation for the
> vector
> > > prediction operation.
> > >
> > > But we can also discuss about adding an ID field to the Vector type.
> > >
> > > Cheers,
> > > Till
> > > On Jun 4, 2015 7:30 PM, "Felix Neutatz" <[email protected]>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have the following use case: I want to to regression for a
> timeseries
> > > > dataset like:
> > > >
> > > > id, x1, x2, ..., xn, y
> > > >
> > > > id = point in time
> > > > x = features
> > > > y = target value
> > > >
> > > > In the Flink frame work I would map this to a LabeledVector (y,
> > > > DenseVector(x)). (I don't want to use the id as a feature)
> > > >
> > > > When I apply finally the predict() method I get a LabeledVector
> > > > (y_predicted, DenseVector(x)).
> > > >
> > > > Now my problem is that I would like to plot the predicted target
> value
> > > > according to its time.
> > > >
> > > > What I have to do now is:
> > > >
> > > > a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
> > > > b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))
> > > >
> > > > a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)
> > > >
> > > > This is really a cumbersome process for such an simple thing. Is
> there
> > > any
> > > > approach which makes this more simple. If not, can we extend the ML
> > API.
> > > to
> > > > allow ids?
> > > >
> > > > Best regards,
> > > > Felix
> > > >
> > >
> >
>

Re: Problem with ML pipeline

Reply via email to