[ https://issues.apache.org/jira/browse/FLINK-24279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440162#comment-17440162 ]
Yun Gao commented on FLINK-24279: --------------------------------- Fix on master via 8269bd9fdf3e5744b2d635697db5c705b9e598f5 > Support withBroadcast with DataStream API in Flink ML Library > -------------------------------------------------------------- > > Key: FLINK-24279 > URL: https://issues.apache.org/jira/browse/FLINK-24279 > Project: Flink > Issue Type: New Feature > Components: Library / Machine Learning > Reporter: Zhipeng Zhang > Priority: Major > Labels: pull-request-available > > When doing machine learning using DataStream, we found that DataStream lacks > withBroadcast() function, which could be useful in machine learning. > > A DataSet-based demo is like: > {code:java} > DataSet<?> d1 = ...; > DataSet<?> d2 = ...; > d1.map(new RichMapFunction <?, ?>() { > @Override > public Object map(Object aLong) throws Exception{ > List<?> elements = getRuntimeContext().getBroadcastVariable("d2"); > ...; > } > }).withBroadcastSet(d2, "d2"); > {code} > > The withBroadcast() function incurs priority-base data-consuming. For example > in the above code snippet, we cannot consume any element from d1 before we > consumed all of elements in d2. > > Thus when supporting withBroadcast() in DataStream, we also need > priority-base data-consuming. This could probably lead to deadlock and > DataStream does not provide a solution for deadlock. > -- This message was sent by Atlassian Jira (v8.20.1#820001)