Re: [QUESTION] the differences between DataStream.join() and DataStream.coGroup()

Jark Wu Thu, 19 May 2016 02:34:00 -0700

Thanks for your explain!  I get it. 
------------------------------------------------------------------From:Fabian 
Hueske <fhue...@gmail.com>Send Time:2016年5月19日(星期四) 
15:21To:dev@flink.apache.org <dev@flink.apache.org>; 伍翀(云邪) 
<wuchong...@alibaba-inc.com>Subject:Re: [QUESTION] the differences between 
DataStream.join() and DataStream.coGroup()
Hi,


you are right, at them moment join() looks like syntactic sugar around 
coGroup(). Internally, it calls wraps a FlatJoinFunction in a CoGroupFunction 
and calls DataStream.coGroup().
This can be done because CoGroup is more generic and can be used to execute a 
Join. However, there can be also more efficient strategies to execute a join 
because join is more specialized.

Providing an API for join has several benefits:
- the implementation can be improved without affecting the user
- The DataStream API is more similar to the DataSet API which might help users 
that touch both APIs.
- Join anc CoGroup are similar, but also different operations. CoGroup looks at 
full group of elements with the same key. Join only at pairs of elements with 
identical keys. Due to SQL, the concept of a join is probably better known than 
coGroup.

Best, Fabian

2016-05-19 9:05 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>:
I have read the source code , and found that the JoinedStreams' implementation 
code is almost the same with CoGroupedStreams' (internally JoinedStreams' 
implementation is based on CoGroupedStreams). So why we provide two different 
interface `DataStream.join()` and `DataStream.coGroup()` which are exactly the 
same ?  And the document[1] has not indicated they are doing the same thing. Or 
is there any differences between `DataStream.join()` and `DataStream.coGroup()` 
which I missed ? 

-- Jark Wu

Re: [QUESTION] the differences between DataStream.join() and DataStream.coGroup()

Reply via email to