Re: is streaming outer join sending unnecessary traffic?

kant kodali Sat, 01 Feb 2020 13:19:13 -0800

Wondering if anyone had a chance to look through this or should I create
the JIRA?




On Wed, Jan 29, 2020 at 6:49 AM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Kant,
>
> I am not an expert on Flink's SQL implementation. Hence, I'm pulling in
> Timo and Jark who might help you with your question.
>
> Cheers,
> Till
>
> On Tue, Jan 28, 2020 at 10:46 PM kant kodali <kanth...@gmail.com> wrote:
>
>> Sorry. fixed some typos.
>>
>> I am doing a streaming outer join from four topics in Kafka lets call
>> them sample1, sample2, sample3, sample4. Each of these test topics has just
>> one column which is of tuple string. my query is this
>>
>> SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL 
>> OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on 
>> sample3.f0=sample4.f0
>>
>>
>> And here is how I send messages to those Kafka topics at various times.
>>
>> At time t1 Send a message "flink" to test-topic1
>>
>> (true,flink,null,null,null) // Looks good
>>
>> At time t2 Send a message "flink" to test-topic4
>>
>> (true,null,null,null,flink) // Looks good
>>
>> At time t3 Send a message "flink" to test-topic3
>>
>> (false,null,null,null,flink) // Looks good
>> (true,null,null,flink,flink) //Looks good
>>
>> At time t4 Send a message "flink" to test-topic2
>>
>> (false,flink,null,null,null) // Looks good
>> (false,null,null,flink,flink) // Looks good
>> *(true,null,null,null,flink) // Redundant?*
>> *(false,null,null,null,flink) // Redundant?*
>> (true,flink,flink,flink,flink) //Looks good
>>
>> Assume t1<t2<t3<t4
>>
>> Those two rows above seem to be redundant to me although the end result
>> is correct. Doesn't see the same behavior if I join two topics. These
>> redundant messages can lead to a lot of database operations underneath so
>> any way to optimize this? I am using Flink 1.9 so not sure if this is
>> already fixed in 1.10.
>>
>> Attached the code as well.
>>
>> Thanks!
>> kant
>>
>>
>> On Tue, Jan 28, 2020 at 1:43 PM kant kodali <kanth...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am doing a streaming outer join from four topics in Kafka lets call
>>> them sample1, sample2, sample3, sample4. Each of these test topics has just
>>> one column which is of tuple string. my query is this
>>>
>>> SELECT * FROM sample1 FULL OUTER JOIN sample2 on sample1.f0=sample2.f0 FULL 
>>> OUTER JOIN sample3 on sample2.f0=sample3.f0 FULL OUTER JOIN sample4 on 
>>> sample3.f0=sample4.f0
>>>
>>>
>>> And here is how I send messages to those Kafka topics at various times.
>>>
>>> At time t1 Send a message "flink" to test-topic1
>>>
>>> (true,flink,null,null,null) // Looks good
>>>
>>> At time t2 Send a message "flink" to test-topic4
>>>
>>> (true,null,null,null,flink) // Looks good
>>>
>>> At time t3 Send a message "flink" to test-topic3
>>>
>>> (false,null,null,null,flink) // Looks good
>>> (true,null,null,flink,flink) //Looks good
>>>
>>> At time t3 Send a message "flink" to test-topic2
>>>
>>> (false,flink,null,null,null) // Looks good
>>> (false,null,null,flink,flink) // Looks good
>>> *(true,null,null,null,flink) // Redundant?*
>>> *(false,null,null,null,flink) // Redundant?*
>>> (true,flink,flink,flink,flink) //Looks good
>>>
>>> Those two rows above seem to be redundant to be although the end result
>>> is correct. Doesn't see the same behavior if I join two topics. This
>>> unwanted message will lead to a lot of database operations underneath so
>>> any way to optimize this? I am using Flink 1.9 so not sure if this is
>>> already fixed in 1.10.
>>>
>>> Attached the code as well.
>>>
>>> Thanks!
>>> kant
>>>
>>>
>>>
>>>

Re: is streaming outer join sending unnecessary traffic?

Reply via email to