Hi,
I have the following use case: I want to to regression for a timeseries
dataset like:
id, x1, x2, ..., xn, y
id = point in time
x = features
y = target value
In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)
When I apply finally the predict() method I get a LabeledVector
(y_predicted, DenseVector(x)).
Now my problem is that I would like to plot the predicted target value
according to its time.
What I have to do now is:
a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))
a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)
This is really a cumbersome process for such an simple thing. Is there any
approach which makes this more simple. If not, can we extend the ML API. to
allow ids?
Best regards,
Felix