Wondering if anyone had a chance to look through this or should I create the JIRA?
On Wed, Jan 29, 2020 at 6:49 AM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Kant, > > I am not an expert on Flink's SQL implementation. Hence, I'm pulling in > Timo and Jark who might help you with your question. > > Cheers, > Till > > On Tue, Jan 28, 2020 at 10:46 PM kant kodali <kanth...@gmail.com> wrote: > >> Sorry. fixed some typos. >> >> I am doing a streaming outer join from four topics in Kafka lets call >> them sample1, sample2, sample3, sample4. Each of these test topics has just >> one column which is of tuple string. my query is this >> >> SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL >> OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on >> sample3.f0=sample4.f0 >> >> >> And here is how I send messages to those Kafka topics at various times. >> >> At time t1 Send a message "flink" to test-topic1 >> >> (true,flink,null,null,null) // Looks good >> >> At time t2 Send a message "flink" to test-topic4 >> >> (true,null,null,null,flink) // Looks good >> >> At time t3 Send a message "flink" to test-topic3 >> >> (false,null,null,null,flink) // Looks good >> (true,null,null,flink,flink) //Looks good >> >> At time t4 Send a message "flink" to test-topic2 >> >> (false,flink,null,null,null) // Looks good >> (false,null,null,flink,flink) // Looks good >> *(true,null,null,null,flink) // Redundant?* >> *(false,null,null,null,flink) // Redundant?* >> (true,flink,flink,flink,flink) //Looks good >> >> Assume t1<t2<t3<t4 >> >> Those two rows above seem to be redundant to me although the end result >> is correct. Doesn't see the same behavior if I join two topics. These >> redundant messages can lead to a lot of database operations underneath so >> any way to optimize this? I am using Flink 1.9 so not sure if this is >> already fixed in 1.10. >> >> Attached the code as well. >> >> Thanks! >> kant >> >> >> On Tue, Jan 28, 2020 at 1:43 PM kant kodali <kanth...@gmail.com> wrote: >> >>> Hi All, >>> >>> I am doing a streaming outer join from four topics in Kafka lets call >>> them sample1, sample2, sample3, sample4. Each of these test topics has just >>> one column which is of tuple string. my query is this >>> >>> SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL >>> OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on >>> sample3.f0=sample4.f0 >>> >>> >>> And here is how I send messages to those Kafka topics at various times. >>> >>> At time t1 Send a message "flink" to test-topic1 >>> >>> (true,flink,null,null,null) // Looks good >>> >>> At time t2 Send a message "flink" to test-topic4 >>> >>> (true,null,null,null,flink) // Looks good >>> >>> At time t3 Send a message "flink" to test-topic3 >>> >>> (false,null,null,null,flink) // Looks good >>> (true,null,null,flink,flink) //Looks good >>> >>> At time t3 Send a message "flink" to test-topic2 >>> >>> (false,flink,null,null,null) // Looks good >>> (false,null,null,flink,flink) // Looks good >>> *(true,null,null,null,flink) // Redundant?* >>> *(false,null,null,null,flink) // Redundant?* >>> (true,flink,flink,flink,flink) //Looks good >>> >>> Those two rows above seem to be redundant to be although the end result >>> is correct. Doesn't see the same behavior if I join two topics. This >>> unwanted message will lead to a lot of database operations underneath so >>> any way to optimize this? I am using Flink 1.9 so not sure if this is >>> already fixed in 1.10. >>> >>> Attached the code as well. >>> >>> Thanks! >>> kant >>> >>> >>> >>>