Hi Kant, I am not an expert on Flink's SQL implementation. Hence, I'm pulling in Timo and Jark who might help you with your question.
Cheers, Till On Tue, Jan 28, 2020 at 10:46 PM kant kodali <kanth...@gmail.com> wrote: > Sorry. fixed some typos. > > I am doing a streaming outer join from four topics in Kafka lets call them > sample1, sample2, sample3, sample4. Each of these test topics has just one > column which is of tuple string. my query is this > > SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL > OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on > sample3.f0=sample4.f0 > > > And here is how I send messages to those Kafka topics at various times. > > At time t1 Send a message "flink" to test-topic1 > > (true,flink,null,null,null) // Looks good > > At time t2 Send a message "flink" to test-topic4 > > (true,null,null,null,flink) // Looks good > > At time t3 Send a message "flink" to test-topic3 > > (false,null,null,null,flink) // Looks good > (true,null,null,flink,flink) //Looks good > > At time t4 Send a message "flink" to test-topic2 > > (false,flink,null,null,null) // Looks good > (false,null,null,flink,flink) // Looks good > *(true,null,null,null,flink) // Redundant?* > *(false,null,null,null,flink) // Redundant?* > (true,flink,flink,flink,flink) //Looks good > > Assume t1<t2<t3<t4 > > Those two rows above seem to be redundant to me although the end result is > correct. Doesn't see the same behavior if I join two topics. These > redundant messages can lead to a lot of database operations underneath so > any way to optimize this? I am using Flink 1.9 so not sure if this is > already fixed in 1.10. > > Attached the code as well. > > Thanks! > kant > > > On Tue, Jan 28, 2020 at 1:43 PM kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I am doing a streaming outer join from four topics in Kafka lets call >> them sample1, sample2, sample3, sample4. Each of these test topics has just >> one column which is of tuple string. my query is this >> >> SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL >> OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on >> sample3.f0=sample4.f0 >> >> >> And here is how I send messages to those Kafka topics at various times. >> >> At time t1 Send a message "flink" to test-topic1 >> >> (true,flink,null,null,null) // Looks good >> >> At time t2 Send a message "flink" to test-topic4 >> >> (true,null,null,null,flink) // Looks good >> >> At time t3 Send a message "flink" to test-topic3 >> >> (false,null,null,null,flink) // Looks good >> (true,null,null,flink,flink) //Looks good >> >> At time t3 Send a message "flink" to test-topic2 >> >> (false,flink,null,null,null) // Looks good >> (false,null,null,flink,flink) // Looks good >> *(true,null,null,null,flink) // Redundant?* >> *(false,null,null,null,flink) // Redundant?* >> (true,flink,flink,flink,flink) //Looks good >> >> Those two rows above seem to be redundant to be although the end result >> is correct. Doesn't see the same behavior if I join two topics. This >> unwanted message will lead to a lot of database operations underneath so >> any way to optimize this? I am using Flink 1.9 so not sure if this is >> already fixed in 1.10. >> >> Attached the code as well. >> >> Thanks! >> kant >> >> >> >>