Ok, but if there is only 1 row per Join key on either side of the join,
then wouldn't "iterate all the values in the MapState under the current
key" effectively be "iterate 1 value in MapState under the current key"
which would be O(1)? Or are you saying that it must seek across the entire
dataset for the whole table even for that 1 row on either side of the join?

Thanks for the help so far!

On Wed, Nov 18, 2020 at 6:30 PM Jark Wu <imj...@gmail.com> wrote:

> Actually, if there is no unique key, it's not O(1), because there maybe
> multiple rows are joined by the join key, i.e. iterate all the values in
> the MapState under the current key, this is a "seek" operation on rocksdb
> which is not efficient.
>
> Are you asking where the join key is set? The join key is set by the
> framework via `AbstractStreamOperator#setKeyContextElement1`.
>
> Best,
> Jark
>
> On Thu, 19 Nov 2020 at 03:18, Rex Fenley <r...@remind101.com> wrote:
>
>> Thanks for the info.
>>
>> So even if there is no unique key inferred for a Row, the set of rows to
>> join on each Join key should effectively still be an O(1) lookup if the
>> join key is unique right?
>>
>> Also, I've been digging around the code to find where the lookup of rows
>> for a join key happens and haven't come across anything. Mind pointing me
>> in the right direction?
>>
>> Thanks!
>>
>> cc Brad
>>
>> On Wed, Nov 18, 2020 at 7:39 AM Jark Wu <imj...@gmail.com> wrote:
>>
>>> Hi Rex,
>>>
>>> Currently, the join operator may use 3 kinds of state structure
>>> depending on the input key and join key information.
>>>
>>> 1) input doesn't have a unique key => MapState<row, count>,
>>> where the map key is the input row and the map value is the number
>>> of equal rows.
>>>
>>> 2) input has unique key, but the unique key is not a subset of join key
>>> => MapState<UK, row>
>>> this is better than the above one, because it has a shorter map key and
>>> is more efficient when retracting records.
>>>
>>> 3) input has a unique key, and the unique key is a subset of join key =>
>>> ValueState<row>
>>> this is the best performance, because it only performs a "get" operation
>>> rather than "seek" on rocksdb
>>>  for each record of the other input side.
>>>
>>> Note: the join key is the key of the keyed states.
>>>
>>> You can see the implementation differences
>>> in 
>>> org.apache.flink.table.runtime.operators.join.stream.state.JoinRecordStateViews.
>>>
>>> Best,
>>> Jark
>>>
>>> On Wed, 18 Nov 2020 at 02:30, Rex Fenley <r...@remind101.com> wrote:
>>>
>>>> Ok, what are the performance consequences then of having a join with
>>>> NoUniqueKey if the left side's key actually is unique in practice?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On Tue, Nov 17, 2020 at 7:35 AM Jark Wu <imj...@gmail.com> wrote:
>>>>
>>>>> Hi Rex,
>>>>>
>>>>> Currently, the unique key is inferred by the optimizer. However, the
>>>>> inference is not perfect.
>>>>> There are known issues that the unique key is not derived correctly,
>>>>> e.g. FLINK-20036 (is this opened by you?). If you think you have the same
>>>>> case, please open an issue.
>>>>>
>>>>> Query hint is a nice way for this, but it is not supported yet.
>>>>> We have an issue to track supporting query hint, see FLINK-17173.
>>>>>
>>>>> Beest,
>>>>> Jark
>>>>>
>>>>>
>>>>> On Tue, 17 Nov 2020 at 15:23, Rex Fenley <r...@remind101.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have quite a few joins in my plan that have
>>>>>>
>>>>>> leftInputSpec=[NoUniqueKey]
>>>>>>
>>>>>> in Flink UI. I know this can't truly be the case that there is no
>>>>>> unique key, at least for some of these joins that I've evaluated.
>>>>>>
>>>>>> Is there a way to hint to the join what the unique key is for a table?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>>>
>>>>>>
>>>>>> Remind.com <https://www.remind.com/> |  BLOG
>>>>>> <http://blog.remind.com/>  |  FOLLOW US
>>>>>> <https://twitter.com/remindhq>  |  LIKE US
>>>>>> <https://www.facebook.com/remindhq>
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to