Ok, but if there is only 1 row per Join key on either side of the join, then wouldn't "iterate all the values in the MapState under the current key" effectively be "iterate 1 value in MapState under the current key" which would be O(1)? Or are you saying that it must seek across the entire dataset for the whole table even for that 1 row on either side of the join?
Thanks for the help so far! On Wed, Nov 18, 2020 at 6:30 PM Jark Wu <imj...@gmail.com> wrote: > Actually, if there is no unique key, it's not O(1), because there maybe > multiple rows are joined by the join key, i.e. iterate all the values in > the MapState under the current key, this is a "seek" operation on rocksdb > which is not efficient. > > Are you asking where the join key is set? The join key is set by the > framework via `AbstractStreamOperator#setKeyContextElement1`. > > Best, > Jark > > On Thu, 19 Nov 2020 at 03:18, Rex Fenley <r...@remind101.com> wrote: > >> Thanks for the info. >> >> So even if there is no unique key inferred for a Row, the set of rows to >> join on each Join key should effectively still be an O(1) lookup if the >> join key is unique right? >> >> Also, I've been digging around the code to find where the lookup of rows >> for a join key happens and haven't come across anything. Mind pointing me >> in the right direction? >> >> Thanks! >> >> cc Brad >> >> On Wed, Nov 18, 2020 at 7:39 AM Jark Wu <imj...@gmail.com> wrote: >> >>> Hi Rex, >>> >>> Currently, the join operator may use 3 kinds of state structure >>> depending on the input key and join key information. >>> >>> 1) input doesn't have a unique key => MapState<row, count>, >>> where the map key is the input row and the map value is the number >>> of equal rows. >>> >>> 2) input has unique key, but the unique key is not a subset of join key >>> => MapState<UK, row> >>> this is better than the above one, because it has a shorter map key and >>> is more efficient when retracting records. >>> >>> 3) input has a unique key, and the unique key is a subset of join key => >>> ValueState<row> >>> this is the best performance, because it only performs a "get" operation >>> rather than "seek" on rocksdb >>> for each record of the other input side. >>> >>> Note: the join key is the key of the keyed states. >>> >>> You can see the implementation differences >>> in >>> org.apache.flink.table.runtime.operators.join.stream.state.JoinRecordStateViews. >>> >>> Best, >>> Jark >>> >>> On Wed, 18 Nov 2020 at 02:30, Rex Fenley <r...@remind101.com> wrote: >>> >>>> Ok, what are the performance consequences then of having a join with >>>> NoUniqueKey if the left side's key actually is unique in practice? >>>> >>>> Thanks! >>>> >>>> >>>> On Tue, Nov 17, 2020 at 7:35 AM Jark Wu <imj...@gmail.com> wrote: >>>> >>>>> Hi Rex, >>>>> >>>>> Currently, the unique key is inferred by the optimizer. However, the >>>>> inference is not perfect. >>>>> There are known issues that the unique key is not derived correctly, >>>>> e.g. FLINK-20036 (is this opened by you?). If you think you have the same >>>>> case, please open an issue. >>>>> >>>>> Query hint is a nice way for this, but it is not supported yet. >>>>> We have an issue to track supporting query hint, see FLINK-17173. >>>>> >>>>> Beest, >>>>> Jark >>>>> >>>>> >>>>> On Tue, 17 Nov 2020 at 15:23, Rex Fenley <r...@remind101.com> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I have quite a few joins in my plan that have >>>>>> >>>>>> leftInputSpec=[NoUniqueKey] >>>>>> >>>>>> in Flink UI. I know this can't truly be the case that there is no >>>>>> unique key, at least for some of these joins that I've evaluated. >>>>>> >>>>>> Is there a way to hint to the join what the unique key is for a table? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> -- >>>>>> >>>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>>> >>>>>> >>>>>> Remind.com <https://www.remind.com/> | BLOG >>>>>> <http://blog.remind.com/> | FOLLOW US >>>>>> <https://twitter.com/remindhq> | LIKE US >>>>>> <https://www.facebook.com/remindhq> >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Rex Fenley | Software Engineer - Mobile and Backend >>>> >>>> >>>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>>> <https://www.facebook.com/remindhq> >>>> >>> >> >> -- >> >> Rex Fenley | Software Engineer - Mobile and Backend >> >> >> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >> <https://www.facebook.com/remindhq> >> > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>