You're right Felix. You need to provide the `FitOperation` and `PredictOperation` for the `Predictor` you want to use and the `FitOperation` and `TransformOperation` for all `Transformer`s you want to chain in front of the `Predictor`.
Specifying which features to take could be a solution. However, then you're always carrying data along which is not needed. Especially for large scale data, this might be prohibitive expensive. I guess the more efficient solution would be to assign an ID and later join with the removed feature elements. Cheers, Till On Mon, Jun 8, 2015 at 7:11 AM Sachin Goel <sachingoel0...@gmail.com> wrote: > A more general approach would be to take as input which indices of the > vector to consider as features. After that, the vector can be returned as > such and user can do what they wish with the non-feature values. This > wouldn't need extending the predict operation, instead this can be > specified in the model itself using a set parameter function. Or perhaps a > better approach is to just take this input in the predict operation. > > Cheers! > Sachin > On Jun 8, 2015 10:17 AM, "Felix Neutatz" <neut...@googlemail.com> wrote: > > > Probably we also need it for the other classes of the pipeline as well, > in > > order to be able to pass the ID through the whole pipeline. > > > > Best regards, > > Felix > > Am 06.06.2015 9:46 vorm. schrieb "Till Rohrmann" <trohrm...@apache.org > >: > > > > > Then you only have to provide an implicit PredictOperation[SVM, (T, > Int), > > > (LabeledVector, Int)] value with T <: Vector in the scope where you > call > > > the predict operation. > > > On Jun 6, 2015 8:14 AM, "Felix Neutatz" <neut...@googlemail.com> > wrote: > > > > > > > That would be great. I like the special predict operation better > > because > > > it > > > > is only in some cases necessary to return the id. The special predict > > > > Operation would save this overhead. > > > > > > > > Best regards, > > > > Felix > > > > Am 04.06.2015 7:56 nachm. schrieb "Till Rohrmann" < > > > till.rohrm...@gmail.com > > > > >: > > > > > > > > > I see your problem. One way to solve the problem is to implement a > > > > special > > > > > PredictOperation which takes a tuple (id, vector) and returns a > tuple > > > > (id, > > > > > labeledVector). You can take a look at the implementation for the > > > vector > > > > > prediction operation. > > > > > > > > > > But we can also discuss about adding an ID field to the Vector > type. > > > > > > > > > > Cheers, > > > > > Till > > > > > On Jun 4, 2015 7:30 PM, "Felix Neutatz" <neut...@googlemail.com> > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I have the following use case: I want to to regression for a > > > timeseries > > > > > > dataset like: > > > > > > > > > > > > id, x1, x2, ..., xn, y > > > > > > > > > > > > id = point in time > > > > > > x = features > > > > > > y = target value > > > > > > > > > > > > In the Flink frame work I would map this to a LabeledVector (y, > > > > > > DenseVector(x)). (I don't want to use the id as a feature) > > > > > > > > > > > > When I apply finally the predict() method I get a LabeledVector > > > > > > (y_predicted, DenseVector(x)). > > > > > > > > > > > > Now my problem is that I would like to plot the predicted target > > > value > > > > > > according to its time. > > > > > > > > > > > > What I have to do now is: > > > > > > > > > > > > a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p)) > > > > > > b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id)) > > > > > > > > > > > > a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p) > > > > > > > > > > > > This is really a cumbersome process for such an simple thing. Is > > > there > > > > > any > > > > > > approach which makes this more simple. If not, can we extend the > ML > > > > API. > > > > > to > > > > > > allow ids? > > > > > > > > > > > > Best regards, > > > > > > Felix > > > > > > > > > > > > > > > > > > > > >