[ 
https://issues.apache.org/jira/browse/FLINK-24279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17440162#comment-17440162
 ] 

Yun Gao commented on FLINK-24279:
---------------------------------

Fix on master via 8269bd9fdf3e5744b2d635697db5c705b9e598f5

>  Support withBroadcast with DataStream API in Flink ML Library
> --------------------------------------------------------------
>
>                 Key: FLINK-24279
>                 URL: https://issues.apache.org/jira/browse/FLINK-24279
>             Project: Flink
>          Issue Type: New Feature
>          Components: Library / Machine Learning
>            Reporter: Zhipeng Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> When doing machine learning using DataStream, we found that DataStream lacks 
> withBroadcast() function, which could be useful in machine learning.
>  
> A DataSet-based demo is like:
> {code:java}
> DataSet<?> d1 = ...;
> DataSet<?> d2 = ...;
> d1.map(new RichMapFunction <?, ?>() {
>        @Override
>        public Object map(Object aLong) throws Exception{
>             List<?> elements = getRuntimeContext().getBroadcastVariable("d2");
>             ...;           
>        }
> }).withBroadcastSet(d2, "d2");
> {code}
>  
> The withBroadcast() function incurs priority-base data-consuming. For example 
> in the above code snippet, we cannot consume any element from d1 before we 
> consumed all of elements in d2. 
>   
>  Thus when supporting withBroadcast() in DataStream, we also need 
> priority-base data-consuming. This could probably lead to deadlock and 
> DataStream does not provide a solution for deadlock.
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to