I have a few more questions. Even if a join has no unique keys, couldn't the join key be used to organize records into a tree, of groups of records, per join key so that lookups are faster?
I also have been looking at RocksDB docs and it looks like it has a RangeScan operation. I'm guessing then join keys could also be hashed in such a way to enable faster lookup by RangeScan. I also noticed mention of Prefix Iterators, which might actually do what I'm suggesting. Have either of these been considered? Thanks! On Thu, Nov 19, 2020 at 6:51 PM Rex Fenley <r...@remind101.com> wrote: > I'm reading your response as rocksdb having to seek across the whole > dataset for the whole table, which we hope to avoid. > > What are the rules for the unique key and unique join key inference? Maybe > we can reorganize our plan to allow it to infer unique keys more correctly. > > Thanks > > On Wed, Nov 18, 2020 at 9:50 PM Jark Wu <imj...@gmail.com> wrote: > >> Yes, exactly. The rocksdb has to "seek" data sets because it doesn't know >> how many entries are under the join key. >> >> On Thu, 19 Nov 2020 at 13:38, Rex Fenley <r...@remind101.com> wrote: >> >>> Ok, but if there is only 1 row per Join key on either side of the join, >>> then wouldn't "iterate all the values in the MapState under the current >>> key" effectively be "iterate 1 value in MapState under the current key" >>> which would be O(1)? Or are you saying that it must seek across the entire >>> dataset for the whole table even for that 1 row on either side of the join? >>> >>> Thanks for the help so far! >>> >>> On Wed, Nov 18, 2020 at 6:30 PM Jark Wu <imj...@gmail.com> wrote: >>> >>>> Actually, if there is no unique key, it's not O(1), because there maybe >>>> multiple rows are joined by the join key, i.e. iterate all the values in >>>> the MapState under the current key, this is a "seek" operation on rocksdb >>>> which is not efficient. >>>> >>>> Are you asking where the join key is set? The join key is set by the >>>> framework via `AbstractStreamOperator#setKeyContextElement1`. >>>> >>>> Best, >>>> Jark >>>> >>>> On Thu, 19 Nov 2020 at 03:18, Rex Fenley <r...@remind101.com> wrote: >>>> >>>>> Thanks for the info. >>>>> >>>>> So even if there is no unique key inferred for a Row, the set of rows >>>>> to join on each Join key should effectively still be an O(1) lookup if the >>>>> join key is unique right? >>>>> >>>>> Also, I've been digging around the code to find where the lookup of >>>>> rows for a join key happens and haven't come across anything. Mind >>>>> pointing >>>>> me in the right direction? >>>>> >>>>> Thanks! >>>>> >>>>> cc Brad >>>>> >>>>> On Wed, Nov 18, 2020 at 7:39 AM Jark Wu <imj...@gmail.com> wrote: >>>>> >>>>>> Hi Rex, >>>>>> >>>>>> Currently, the join operator may use 3 kinds of state structure >>>>>> depending on the input key and join key information. >>>>>> >>>>>> 1) input doesn't have a unique key => MapState<row, count>, >>>>>> where the map key is the input row and the map value is the number >>>>>> of equal rows. >>>>>> >>>>>> 2) input has unique key, but the unique key is not a subset of join >>>>>> key => MapState<UK, row> >>>>>> this is better than the above one, because it has a shorter map key >>>>>> and >>>>>> is more efficient when retracting records. >>>>>> >>>>>> 3) input has a unique key, and the unique key is a subset of join key >>>>>> => ValueState<row> >>>>>> this is the best performance, because it only performs a "get" >>>>>> operation rather than "seek" on rocksdb >>>>>> for each record of the other input side. >>>>>> >>>>>> Note: the join key is the key of the keyed states. >>>>>> >>>>>> You can see the implementation differences >>>>>> in >>>>>> org.apache.flink.table.runtime.operators.join.stream.state.JoinRecordStateViews. >>>>>> >>>>>> Best, >>>>>> Jark >>>>>> >>>>>> On Wed, 18 Nov 2020 at 02:30, Rex Fenley <r...@remind101.com> wrote: >>>>>> >>>>>>> Ok, what are the performance consequences then of having a join with >>>>>>> NoUniqueKey if the left side's key actually is unique in practice? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 17, 2020 at 7:35 AM Jark Wu <imj...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Rex, >>>>>>>> >>>>>>>> Currently, the unique key is inferred by the optimizer. However, >>>>>>>> the inference is not perfect. >>>>>>>> There are known issues that the unique key is not derived >>>>>>>> correctly, e.g. FLINK-20036 (is this opened by you?). If you think you >>>>>>>> have >>>>>>>> the same case, please open an issue. >>>>>>>> >>>>>>>> Query hint is a nice way for this, but it is not supported yet. >>>>>>>> We have an issue to track supporting query hint, see FLINK-17173. >>>>>>>> >>>>>>>> Beest, >>>>>>>> Jark >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 17 Nov 2020 at 15:23, Rex Fenley <r...@remind101.com> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I have quite a few joins in my plan that have >>>>>>>>> >>>>>>>>> leftInputSpec=[NoUniqueKey] >>>>>>>>> >>>>>>>>> in Flink UI. I know this can't truly be the case that there is no >>>>>>>>> unique key, at least for some of these joins that I've evaluated. >>>>>>>>> >>>>>>>>> Is there a way to hint to the join what the unique key is for a >>>>>>>>> table? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>>>>>> >>>>>>>>> >>>>>>>>> Remind.com <https://www.remind.com/> | BLOG >>>>>>>>> <http://blog.remind.com/> | FOLLOW US >>>>>>>>> <https://twitter.com/remindhq> | LIKE US >>>>>>>>> <https://www.facebook.com/remindhq> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>>>> >>>>>>> >>>>>>> Remind.com <https://www.remind.com/> | BLOG >>>>>>> <http://blog.remind.com/> | FOLLOW US >>>>>>> <https://twitter.com/remindhq> | LIKE US >>>>>>> <https://www.facebook.com/remindhq> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>> >>>>> >>>>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>>>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>>>> <https://www.facebook.com/remindhq> >>>>> >>>> >>> >>> -- >>> >>> Rex Fenley | Software Engineer - Mobile and Backend >>> >>> >>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>> <https://www.facebook.com/remindhq> >>> >> > > -- > > Rex Fenley | Software Engineer - Mobile and Backend > > > Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | > FOLLOW US <https://twitter.com/remindhq> | LIKE US > <https://www.facebook.com/remindhq> > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>