Sorry. fixed some typos. I am doing a streaming outer join from four topics in Kafka lets call them sample1, sample2, sample3, sample4. Each of these test topics has just one column which is of tuple string. my query is this
SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on sample3.f0=sample4.f0 And here is how I send messages to those Kafka topics at various times. At time t1 Send a message "flink" to test-topic1 (true,flink,null,null,null) // Looks good At time t2 Send a message "flink" to test-topic4 (true,null,null,null,flink) // Looks good At time t3 Send a message "flink" to test-topic3 (false,null,null,null,flink) // Looks good (true,null,null,flink,flink) //Looks good At time t4 Send a message "flink" to test-topic2 (false,flink,null,null,null) // Looks good (false,null,null,flink,flink) // Looks good *(true,null,null,null,flink) // Redundant?* *(false,null,null,null,flink) // Redundant?* (true,flink,flink,flink,flink) //Looks good Assume t1<t2<t3<t4 Those two rows above seem to be redundant to me although the end result is correct. Doesn't see the same behavior if I join two topics. These redundant messages can lead to a lot of database operations underneath so any way to optimize this? I am using Flink 1.9 so not sure if this is already fixed in 1.10. Attached the code as well. Thanks! kant On Tue, Jan 28, 2020 at 1:43 PM kant kodali <kanth...@gmail.com> wrote: > Hi All, > > I am doing a streaming outer join from four topics in Kafka lets call them > sample1, sample2, sample3, sample4. Each of these test topics has just one > column which is of tuple string. my query is this > > SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL > OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on > sample3.f0=sample4.f0 > > > And here is how I send messages to those Kafka topics at various times. > > At time t1 Send a message "flink" to test-topic1 > > (true,flink,null,null,null) // Looks good > > At time t2 Send a message "flink" to test-topic4 > > (true,null,null,null,flink) // Looks good > > At time t3 Send a message "flink" to test-topic3 > > (false,null,null,null,flink) // Looks good > (true,null,null,flink,flink) //Looks good > > At time t3 Send a message "flink" to test-topic2 > > (false,flink,null,null,null) // Looks good > (false,null,null,flink,flink) // Looks good > *(true,null,null,null,flink) // Redundant?* > *(false,null,null,null,flink) // Redundant?* > (true,flink,flink,flink,flink) //Looks good > > Those two rows above seem to be redundant to be although the end result is > correct. Doesn't see the same behavior if I join two topics. This unwanted > message will lead to a lot of database operations underneath so any way to > optimize this? I am using Flink 1.9 so not sure if this is already fixed in > 1.10. > > Attached the code as well. > > Thanks! > kant > > > >