Thanks for your explain! I get it. ------------------------------------------------------------------From:Fabian Hueske <fhue...@gmail.com>Send Time:2016年5月19日(星期四) 15:21To:dev@flink.apache.org <dev@flink.apache.org>; 伍翀(云邪) <wuchong...@alibaba-inc.com>Subject:Re: [QUESTION] the differences between DataStream.join() and DataStream.coGroup() Hi,
you are right, at them moment join() looks like syntactic sugar around coGroup(). Internally, it calls wraps a FlatJoinFunction in a CoGroupFunction and calls DataStream.coGroup(). This can be done because CoGroup is more generic and can be used to execute a Join. However, there can be also more efficient strategies to execute a join because join is more specialized. Providing an API for join has several benefits: - the implementation can be improved without affecting the user - The DataStream API is more similar to the DataSet API which might help users that touch both APIs. - Join anc CoGroup are similar, but also different operations. CoGroup looks at full group of elements with the same key. Join only at pairs of elements with identical keys. Due to SQL, the concept of a join is probably better known than coGroup. Best, Fabian 2016-05-19 9:05 GMT+02:00 Jark Wu <wuchong...@alibaba-inc.com>: I have read the source code , and found that the JoinedStreams' implementation code is almost the same with CoGroupedStreams' (internally JoinedStreams' implementation is based on CoGroupedStreams). So why we provide two different interface `DataStream.join()` and `DataStream.coGroup()` which are exactly the same ? And the document[1] has not indicated they are doing the same thing. Or is there any differences between `DataStream.join()` and `DataStream.coGroup()` which I missed ? -- Jark Wu