I see your problem. One way to solve the problem is to implement a special PredictOperation which takes a tuple (id, vector) and returns a tuple (id, labeledVector). You can take a look at the implementation for the vector prediction operation.
But we can also discuss about adding an ID field to the Vector type. Cheers, Till On Jun 4, 2015 7:30 PM, "Felix Neutatz" <neut...@googlemail.com> wrote: > Hi, > > I have the following use case: I want to to regression for a timeseries > dataset like: > > id, x1, x2, ..., xn, y > > id = point in time > x = features > y = target value > > In the Flink frame work I would map this to a LabeledVector (y, > DenseVector(x)). (I don't want to use the id as a feature) > > When I apply finally the predict() method I get a LabeledVector > (y_predicted, DenseVector(x)). > > Now my problem is that I would like to plot the predicted target value > according to its time. > > What I have to do now is: > > a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p)) > b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id)) > > a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p) > > This is really a cumbersome process for such an simple thing. Is there any > approach which makes this more simple. If not, can we extend the ML API. to > allow ids? > > Best regards, > Felix >