is streaming outer join sending unnecessary traffic?

kant kodali Tue, 28 Jan 2020 13:44:21 -0800

Hi All,

I am doing a streaming outer join from four topics in Kafka lets call them
sample1, sample2, sample3, sample4. Each of these test topics has just one
column which is of tuple string. my query is this


SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0
FULL OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN
sample4 on sample3.f0=sample4.f0


And here is how I send messages to those Kafka topics at various times.

At time t1 Send a message "flink" to test-topic1

(true,flink,null,null,null) // Looks good

At time t2 Send a message "flink" to test-topic4

(true,null,null,null,flink) // Looks good

At time t3 Send a message "flink" to test-topic3

(false,null,null,null,flink) // Looks good
(true,null,null,flink,flink) //Looks good

At time t3 Send a message "flink" to test-topic2

(false,flink,null,null,null) // Looks good
(false,null,null,flink,flink) // Looks good
*(true,null,null,null,flink) // Redundant?*
*(false,null,null,null,flink) // Redundant?*
(true,flink,flink,flink,flink) //Looks good

Those two rows above seem to be redundant to be although the end result is
correct. Doesn't see the same behavior if I join two topics. This unwanted
message will lead to a lot of database operations underneath so any way to
optimize this? I am using Flink 1.9 so not sure if this is already fixed in
1.10.

Attached the code as well.

Thanks!
kant

Test.java
Description: Binary data

is streaming outer join sending unnecessary traffic?

Reply via email to