Thanks for the info.

So even if there is no unique key inferred for a Row, the set of rows to
join on each Join key should effectively still be an O(1) lookup if the
join key is unique right?

Also, I've been digging around the code to find where the lookup of rows
for a join key happens and haven't come across anything. Mind pointing me
in the right direction?

Thanks!

cc Brad

On Wed, Nov 18, 2020 at 7:39 AM Jark Wu <imj...@gmail.com> wrote:

> Hi Rex,
>
> Currently, the join operator may use 3 kinds of state structure depending
> on the input key and join key information.
>
> 1) input doesn't have a unique key => MapState<row, count>,
> where the map key is the input row and the map value is the number
> of equal rows.
>
> 2) input has unique key, but the unique key is not a subset of join key =>
> MapState<UK, row>
> this is better than the above one, because it has a shorter map key and
> is more efficient when retracting records.
>
> 3) input has a unique key, and the unique key is a subset of join key =>
> ValueState<row>
> this is the best performance, because it only performs a "get" operation
> rather than "seek" on rocksdb
>  for each record of the other input side.
>
> Note: the join key is the key of the keyed states.
>
> You can see the implementation differences
> in 
> org.apache.flink.table.runtime.operators.join.stream.state.JoinRecordStateViews.
>
> Best,
> Jark
>
> On Wed, 18 Nov 2020 at 02:30, Rex Fenley <r...@remind101.com> wrote:
>
>> Ok, what are the performance consequences then of having a join with
>> NoUniqueKey if the left side's key actually is unique in practice?
>>
>> Thanks!
>>
>>
>> On Tue, Nov 17, 2020 at 7:35 AM Jark Wu <imj...@gmail.com> wrote:
>>
>>> Hi Rex,
>>>
>>> Currently, the unique key is inferred by the optimizer. However, the
>>> inference is not perfect.
>>> There are known issues that the unique key is not derived correctly,
>>> e.g. FLINK-20036 (is this opened by you?). If you think you have the same
>>> case, please open an issue.
>>>
>>> Query hint is a nice way for this, but it is not supported yet.
>>> We have an issue to track supporting query hint, see FLINK-17173.
>>>
>>> Beest,
>>> Jark
>>>
>>>
>>> On Tue, 17 Nov 2020 at 15:23, Rex Fenley <r...@remind101.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have quite a few joins in my plan that have
>>>>
>>>> leftInputSpec=[NoUniqueKey]
>>>>
>>>> in Flink UI. I know this can't truly be the case that there is no
>>>> unique key, at least for some of these joins that I've evaluated.
>>>>
>>>> Is there a way to hint to the join what the unique key is for a table?
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to