Your best bet is to try out both approaches with some representative data.
On 12/01/2022 08:11, Ronak Beejawat (rbeejawa) wrote:
Hi Team,

Can you please help me with the below query, I wanted to know which approach 
will be better and efficient for multiple left join within one min tumbling 
window concept (Datastream Vs SQL API wrt. performance and memory management)

Use case :
1. We have topic one (testtopic1) which will get half a million data every 
minute.
2. We have topic two (testtopic2) which will get 23 data points as static or 
reference.
3. We have topic two (testtopic3) which will get one million data every minute.


So we are doing join as (select * testtopic1  left join  testtopic2 left join 
testtopic3  group by tumble window of 1 min duration)

So the question is which API will be more efficient and faster for such use 
case (datastream API or sql API) for intensive joining logic?

Thanks
Ronak Beejawat



From: Ronak Beejawat (rbeejawa)
Sent: Tuesday, January 11, 2022 6:12 PM
To: 'd...@flink.apache.org' <d...@flink.apache.org>; 'community@flink.apache.org' 
<community@flink.apache.org>; 'u...@flink.apache.org' <u...@flink.apache.org>
Cc: 'Hang Ruan' <ruanhang1...@gmail.com>; Shrinath Shenoy K (sshenoyk) 
<sshen...@cisco.com>
Subject: RE: what is efficient way to write Left join in flink

Can please someone help / reply on below Question ?

From: Ronak Beejawat (rbeejawa)
Sent: Monday, January 10, 2022 7:40 PM
To: d...@flink.apache.org<mailto:d...@flink.apache.org>; 
community@flink.apache.org<mailto:community@flink.apache.org>; 
u...@flink.apache.org<mailto:u...@flink.apache.org>
Cc: Hang Ruan <ruanhang1...@gmail.com<mailto:ruanhang1...@gmail.com>>; Shrinath Shenoy K 
(sshenoyk) <sshen...@cisco.com<mailto:sshen...@cisco.com>>
Subject: what is efficient way to write Left join in flink

Hi Team,

We want a clarification on one real time processing scenario for below 
mentioned use case.

Use case :
1. We have topic one (testtopic1) which will get half a million data every 
minute.
2. We have topic two (testtopic2) which will get one million data every minute.

So we are doing join as testtopic1  left join  testtopic2 which has a 
correlated data 1:2

So the question is which API will be more efficient and faster for such use 
case (datastream API or sql API) for intensive joining logic?

Thanks
Ronak Beejawat

Reply via email to