Dear Cederic, I did something similar as yours a while ago along this work [1] but I've always been working within the batch context. I'm also the co-author of flink-jpmml and, since a flink2pmml model saver library doesn't exist currently, I'd suggest you a twofold strategy to tackle this problem: - if your model is relatively simple, take the batch evaluate method (it belongs to your SVM classifier) and attempt to translate it in a flatMap function (hopefully you can reuse some internal utilities, Flink exploits breeze vector library under the hoods [3]). - if your model is a complex one, you should export the model into PMML and employ then [2]. For a first overview, this [4] is the library you should adopt as to export your model and this [5] can help you with the related implementation.
Hope it can help and good luck! Andrea [1] https://dl.acm.org/citation.cfm?id=3070612 [2] https://github.com/FlinkML/flink-jpmml [3] https://github.com/apache/flink/blob/7034e9cfcb051ef90c5bf0960bfb50a79b3723f0/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala#L73 [4] https://github.com/jpmml/jpmml-model [5] https://github.com/jpmml/jpmml-sparkml 2018-07-24 13:29 GMT+02:00 David Anderson <da...@data-artisans.com>: > One option (which I haven't tried myself) would be to somehow get the > model into PMML format, and then use https://github.com/ > FlinkML/flink-jpmml to score the model. You could either use another > machine learning framework to train the model (i.e., a framework that > directly supports PMML export), or convert the Flink model into PMML. Since > SVMs are fairly simple to describe, that might not be terribly difficult. > > On Mon, Jul 23, 2018 at 4:18 AM Xingcan Cui <xingc...@gmail.com> wrote: > >> Hi Cederic, >> >> If the model is a simple function, you can just load it and make >> predictions using the map/flatMap function in the StreamEnvironment. >> >> But I’m afraid the model trained by Flink-ML should be a “batch job", >> whose predict method takes a Dataset as the parameter and outputs another >> Dataset as the result. That means you cannot easily apply the model on >> streams, at least for now. >> >> There are two options to solve this. (1) Train the dataset using another >> framework to produce a simple function. (2) Adjust your model serving as a >> series of batch jobs. >> >> Hope that helps, >> Xingcan >> >> On Jul 22, 2018, at 8:56 PM, Hequn Cheng <chenghe...@gmail.com> wrote: >> >> Hi Cederic, >> >> I am not familiar with SVM or machine learning but I think we can work it >> out together. >> What problem have you met when you try to implement this function? From >> my point of view, we can rebuild the model in the flatMap function and use >> it to predict the input data. There are some flatMap documents here[1]. >> >> Best, Hequn >> >> [1] https://ci.apache.org/projects/flink/flink-docs- >> master/dev/stream/operators/#datastream-transformations >> >> >> >> >> >> On Sun, Jul 22, 2018 at 4:12 PM, Cederic Bosmans <bosman...@gmail.com> >> wrote: >> >>> Dear >>> >>> My name is Cederic Bosmans and I am a masters student at the Ghent >>> University (Belgium). >>> I am currently working on my masters dissertation which involves Apache >>> Flink. >>> >>> I want to make predictions in the streaming environment based on a model >>> trained in the batch environment. >>> >>> I trained my SVM-model this way: >>> val svm2 = SVM() >>> svm2.setSeed(1) >>> svm2.fit(trainLV) >>> val testVD = testLV.map(lv => (lv.vector, lv.label)) >>> val evalSet = svm2.evaluate(testVD) >>> >>> and saved the model: >>> val modelSvm = svm2.weightsOption.get >>> >>> Then I have an incoming datastream in the streaming environment: >>> dataStream[(Int, Int, Int)] >>> which should be bininary classified using this trained SVM model. >>> >>> Since the predict function does only support DataSet and not DataStream, >>> on stackoverflow a flink contributor mentioned that this should be done >>> using a map/flatMap function. >>> Unfortunately I am not able to work this function out. >>> >>> It would be incredible for me if you could help me a little bit further! >>> >>> Kind regards and thanks in advance >>> Cederic Bosmans >>> >> >> >> > > -- > *David Anderson* | Training Coordinator | data Artisans > -- > Join Flink Forward - The Apache Flink Conference > Stream Processing | Event Driven | Real Time > -- *Andrea Spina* Software Engineer @ Radicalbit Srl Via Borsieri 41, 20159, Milano - IT