Your best bet is to try out both approaches with some representative data.
On 12/01/2022 08:11, Ronak Beejawat (rbeejawa) wrote:
Hi Team, Can you please help me with the below query, I wanted to know which approach will be better and efficient for multiple left join within one min tumbling window concept (Datastream Vs SQL API wrt. performance and memory management) Use case : 1. We have topic one (testtopic1) which will get half a million data every minute. 2. We have topic two (testtopic2) which will get 23 data points as static or reference. 3. We have topic two (testtopic3) which will get one million data every minute. So we are doing join as (select * testtopic1 left join testtopic2 left join testtopic3 group by tumble window of 1 min duration) So the question is which API will be more efficient and faster for such use case (datastream API or sql API) for intensive joining logic? Thanks Ronak Beejawat From: Ronak Beejawat (rbeejawa) Sent: Tuesday, January 11, 2022 6:12 PM To: 'd...@flink.apache.org' <d...@flink.apache.org>; 'community@flink.apache.org' <community@flink.apache.org>; 'u...@flink.apache.org' <u...@flink.apache.org> Cc: 'Hang Ruan' <ruanhang1...@gmail.com>; Shrinath Shenoy K (sshenoyk) <sshen...@cisco.com> Subject: RE: what is efficient way to write Left join in flink Can please someone help / reply on below Question ? From: Ronak Beejawat (rbeejawa) Sent: Monday, January 10, 2022 7:40 PM To: d...@flink.apache.org<mailto:d...@flink.apache.org>; community@flink.apache.org<mailto:community@flink.apache.org>; u...@flink.apache.org<mailto:u...@flink.apache.org> Cc: Hang Ruan <ruanhang1...@gmail.com<mailto:ruanhang1...@gmail.com>>; Shrinath Shenoy K (sshenoyk) <sshen...@cisco.com<mailto:sshen...@cisco.com>> Subject: what is efficient way to write Left join in flink Hi Team, We want a clarification on one real time processing scenario for below mentioned use case. Use case : 1. We have topic one (testtopic1) which will get half a million data every minute. 2. We have topic two (testtopic2) which will get one million data every minute. So we are doing join as testtopic1 left join testtopic2 which has a correlated data 1:2 So the question is which API will be more efficient and faster for such use case (datastream API or sql API) for intensive joining logic? Thanks Ronak Beejawat