Dear Cederic,
I did something similar as yours a while ago along this work [1] but I've
always been working within the batch context. I'm also the co-author of
flink-jpmml and, since a flink2pmml model saver library doesn't exist
currently, I'd suggest you a twofold strategy to tackle this problem:
- if your model is relatively simple, take the batch evaluate method (it
belongs to your SVM classifier) and attempt to translate it in a flatMap
function (hopefully you can reuse some internal utilities, Flink exploits
breeze vector library under the hoods [3]).
- if your model is a complex one, you should export the model into PMML and
employ then [2]. For a first overview, this [4] is the library you should
adopt as to export your model and this [5] can help you with the related
implementation.

Hope it can help and good luck!

Andrea

[1] https://dl.acm.org/citation.cfm?id=3070612
[2] https://github.com/FlinkML/flink-jpmml
[3]
https://github.com/apache/flink/blob/7034e9cfcb051ef90c5bf0960bfb50a79b3723f0/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala#L73
[4] https://github.com/jpmml/jpmml-model
[5] https://github.com/jpmml/jpmml-sparkml

2018-07-24 13:29 GMT+02:00 David Anderson <da...@data-artisans.com>:

> One option (which I haven't tried myself) would be to somehow get the
> model into PMML format, and then use https://github.com/
> FlinkML/flink-jpmml to score the model. You could either use another
> machine learning framework to train the model (i.e., a framework that
> directly supports PMML export), or convert the Flink model into PMML. Since
> SVMs are fairly simple to describe, that might not be terribly difficult.
>
> On Mon, Jul 23, 2018 at 4:18 AM Xingcan Cui <xingc...@gmail.com> wrote:
>
>> Hi Cederic,
>>
>> If the model is a simple function, you can just load it and make
>> predictions using the map/flatMap function in the StreamEnvironment.
>>
>> But I’m afraid the model trained by Flink-ML should be a “batch job",
>> whose predict method takes a Dataset as the parameter and outputs another
>> Dataset as the result. That means you cannot easily apply the model on
>> streams, at least for now.
>>
>> There are two options to solve this. (1) Train the dataset using another
>> framework to produce a simple function. (2) Adjust your model serving as a
>> series of batch jobs.
>>
>> Hope that helps,
>> Xingcan
>>
>> On Jul 22, 2018, at 8:56 PM, Hequn Cheng <chenghe...@gmail.com> wrote:
>>
>> Hi Cederic,
>>
>> I am not familiar with SVM or machine learning but I think we can work it
>> out together.
>> What problem have you met when you try to implement this function? From
>> my point of view, we can rebuild the model in the flatMap function and use
>> it to predict the input data. There are some flatMap documents here[1].
>>
>> Best, Hequn
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-
>> master/dev/stream/operators/#datastream-transformations
>>
>>
>>
>>
>>
>> On Sun, Jul 22, 2018 at 4:12 PM, Cederic Bosmans <bosman...@gmail.com>
>> wrote:
>>
>>> Dear
>>>
>>> My name is Cederic Bosmans and I am a masters student at the Ghent
>>> University (Belgium).
>>> I am currently working on my masters dissertation which involves Apache
>>> Flink.
>>>
>>> I want to make predictions in the streaming environment based on a model
>>> trained in the batch environment.
>>>
>>> I trained my SVM-model this way:
>>> val svm2 = SVM()
>>> svm2.setSeed(1)
>>> svm2.fit(trainLV)
>>> val testVD = testLV.map(lv => (lv.vector, lv.label))
>>> val evalSet = svm2.evaluate(testVD)
>>>
>>> and saved the model:
>>> val modelSvm = svm2.weightsOption.get
>>>
>>> Then I have an incoming datastream in the streaming environment:
>>> dataStream[(Int, Int, Int)]
>>> which should be bininary classified using this trained SVM model.
>>>
>>> Since the predict function does only support DataSet and not DataStream,
>>> on stackoverflow a flink contributor mentioned that this should be done
>>> using a map/flatMap function.
>>> Unfortunately I am not able to work this function out.
>>>
>>> It would be incredible for me if you could help me a little bit further!
>>>
>>> Kind regards and thanks in advance
>>> Cederic Bosmans
>>>
>>
>>
>>
>
> --
> *David Anderson* | Training Coordinator | data Artisans
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time
>



-- 
*Andrea Spina*
Software Engineer @ Radicalbit Srl
Via Borsieri 41, 20159, Milano - IT

Reply via email to