Hello,

I have a stream of objects which I use to update the model of a
classification algorithm and another stream with the objects I need to
classify in real time.

The problem is that the instances for training and evaluation are processed
on potentially different Flink nodes, but the classifier should be applied
to all instances no matter in what node it was generated (ie, the
classifier should be accessible from any Flink node).

Just to make it clearer, here is what would NOT work since these sink
functions are not serializable:
https://gist.github.com/b979bf742b0d2f3da8cc8e5e91207151

Two questions here:

*1. How can an instance be accessed by any Flink node like this (line 11
and 19)? Maybe there's a better approach to this problem.*

*2. In the example the second stream (line 15) is started right away but at
startup the classifier is not ready to use until it has been trained with
enough instances. Is it possible to do this? If I'm not wrong env.execute
(line 24) can be used only once.*

Regards,
Matt

Reply via email to