Hi,

I have the following use case: I want to to regression for a timeseries
dataset like:

id, x1, x2, ..., xn, y

id = point in time
x = features
y = target value

In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)

When I apply finally the predict() method I get a LabeledVector
(y_predicted, DenseVector(x)).

Now my problem is that I would like to plot the predicted target value
according to its time.

What I have to do now is:

a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))

a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)

This is really a cumbersome process for such an simple thing. Is there any
approach which makes this more simple. If not, can we extend the ML API. to
allow ids?

Best regards,
Felix

Reply via email to