Errata: How can an *object (such as the classifier, line 1)* be accessed by any Flink node [...]
Just in case, the classifier itself can't be serialized I believe, it's part of a framework which I can't modify. In any case, even if it's serialized, I guess the cost of moving it to one node and then another makes the whole data flow unpractical. It's better to move all created instances to one single node where only one instance of the classifier is maintained. I'm not sure if this is possible or how to do this. On Thu, Jan 12, 2017 at 11:11 PM, Matt <dromitl...@gmail.com> wrote: > Hello, > > I have a stream of objects which I use to update the model of a > classification algorithm and another stream with the objects I need to > classify in real time. > > The problem is that the instances for training and evaluation are > processed on potentially different Flink nodes, but the classifier should > be applied to all instances no matter in what node it was generated (ie, > the classifier should be accessible from any Flink node). > > Just to make it clearer, here is what would NOT work since these sink > functions are not serializable: https://gist.github.com/ > b979bf742b0d2f3da8cc8e5e91207151 > > Two questions here: > > *1. How can an instance be accessed by any Flink node like this (line 11 > and 19)? Maybe there's a better approach to this problem.* > > *2. In the example the second stream (line 15) is started right away but > at startup the classifier is not ready to use until it has been trained > with enough instances. Is it possible to do this? If I'm not wrong > env.execute (line 24) can be used only once.* > > Regards, > Matt >