your message is very short,i can not read more.the follow is my guss,
in flink,the dataStream is not for iterative computation,the dataSet would
be more well.and fink suggest broadcast mini data,not large.
your can load your model data (it can be from file,or table),before main
function,andassignment to variable ,like name=yourModel.
and the dataStream(it is a stream,unscored record,like DataStream[String] or
DataStream[yourClass]),
and dataStream.map{x=>
val score = computeScore(x,yourModel)
}
object YourObject {
load your model
val yourModel = ;
def main(){
...............
read unscoreed record,from socket or kafka,or ....
dataStream.map{x=>
val score = computeScore(x,yourModel)
}
......
}
}
----- 原始邮件 -----
发件人:Anchit Jatana <[email protected]>
收件人:[email protected]
主题:How to Broadcast a very large model object (used in iterative scoring in
recommendation system) in Flink
日期:2016年09月30日 14点15分
Hi All,
I'm building a recommendation system streaming application for which I need
to broadcast a very large model object (used in iterative scoring) among all
the task managers performing the operation parallely for the operator
I'm doing an this operation in map1 of CoMapFunction. Please suggest me
some way to achieve the broadcasting of the large model variable (something
similar to what Spark has with broadcast variables).
Thank you
Regards,Anchit