Yes, exactly. The rocksdb has to "seek" data sets because it doesn't know
how many entries are under the join key.

On Thu, 19 Nov 2020 at 13:38, Rex Fenley <r...@remind101.com> wrote:

> Ok, but if there is only 1 row per Join key on either side of the join,
> then wouldn't "iterate all the values in the MapState under the current
> key" effectively be "iterate 1 value in MapState under the current key"
> which would be O(1)? Or are you saying that it must seek across the entire
> dataset for the whole table even for that 1 row on either side of the join?
>
> Thanks for the help so far!
>
> On Wed, Nov 18, 2020 at 6:30 PM Jark Wu <imj...@gmail.com> wrote:
>
>> Actually, if there is no unique key, it's not O(1), because there maybe
>> multiple rows are joined by the join key, i.e. iterate all the values in
>> the MapState under the current key, this is a "seek" operation on rocksdb
>> which is not efficient.
>>
>> Are you asking where the join key is set? The join key is set by the
>> framework via `AbstractStreamOperator#setKeyContextElement1`.
>>
>> Best,
>> Jark
>>
>> On Thu, 19 Nov 2020 at 03:18, Rex Fenley <r...@remind101.com> wrote:
>>
>>> Thanks for the info.
>>>
>>> So even if there is no unique key inferred for a Row, the set of rows to
>>> join on each Join key should effectively still be an O(1) lookup if the
>>> join key is unique right?
>>>
>>> Also, I've been digging around the code to find where the lookup of rows
>>> for a join key happens and haven't come across anything. Mind pointing me
>>> in the right direction?
>>>
>>> Thanks!
>>>
>>> cc Brad
>>>
>>> On Wed, Nov 18, 2020 at 7:39 AM Jark Wu <imj...@gmail.com> wrote:
>>>
>>>> Hi Rex,
>>>>
>>>> Currently, the join operator may use 3 kinds of state structure
>>>> depending on the input key and join key information.
>>>>
>>>> 1) input doesn't have a unique key => MapState<row, count>,
>>>> where the map key is the input row and the map value is the number
>>>> of equal rows.
>>>>
>>>> 2) input has unique key, but the unique key is not a subset of join key
>>>> => MapState<UK, row>
>>>> this is better than the above one, because it has a shorter map key and
>>>> is more efficient when retracting records.
>>>>
>>>> 3) input has a unique key, and the unique key is a subset of join key
>>>> => ValueState<row>
>>>> this is the best performance, because it only performs a "get"
>>>> operation rather than "seek" on rocksdb
>>>>  for each record of the other input side.
>>>>
>>>> Note: the join key is the key of the keyed states.
>>>>
>>>> You can see the implementation differences
>>>> in 
>>>> org.apache.flink.table.runtime.operators.join.stream.state.JoinRecordStateViews.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Wed, 18 Nov 2020 at 02:30, Rex Fenley <r...@remind101.com> wrote:
>>>>
>>>>> Ok, what are the performance consequences then of having a join with
>>>>> NoUniqueKey if the left side's key actually is unique in practice?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On Tue, Nov 17, 2020 at 7:35 AM Jark Wu <imj...@gmail.com> wrote:
>>>>>
>>>>>> Hi Rex,
>>>>>>
>>>>>> Currently, the unique key is inferred by the optimizer. However, the
>>>>>> inference is not perfect.
>>>>>> There are known issues that the unique key is not derived correctly,
>>>>>> e.g. FLINK-20036 (is this opened by you?). If you think you have the same
>>>>>> case, please open an issue.
>>>>>>
>>>>>> Query hint is a nice way for this, but it is not supported yet.
>>>>>> We have an issue to track supporting query hint, see FLINK-17173.
>>>>>>
>>>>>> Beest,
>>>>>> Jark
>>>>>>
>>>>>>
>>>>>> On Tue, 17 Nov 2020 at 15:23, Rex Fenley <r...@remind101.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have quite a few joins in my plan that have
>>>>>>>
>>>>>>> leftInputSpec=[NoUniqueKey]
>>>>>>>
>>>>>>> in Flink UI. I know this can't truly be the case that there is no
>>>>>>> unique key, at least for some of these joins that I've evaluated.
>>>>>>>
>>>>>>> Is there a way to hint to the join what the unique key is for a
>>>>>>> table?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>>>>
>>>>>>>
>>>>>>> Remind.com <https://www.remind.com/> |  BLOG
>>>>>>> <http://blog.remind.com/>  |  FOLLOW US
>>>>>>> <https://twitter.com/remindhq>  |  LIKE US
>>>>>>> <https://www.facebook.com/remindhq>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>>
>>>>>
>>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>>> <https://www.facebook.com/remindhq>
>>>>>
>>>>
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>

Reply via email to