Hi Rex,
The join key already has been used to organize records. As I said before,
"the join key is the key of the keyed states". So an iterate on the
MapState actually is a range scan (scan the join key prefix). However, this
will perform "seek" operation which is rather slow than "get" operation.
I have a few more questions.
Even if a join has no unique keys, couldn't the join key be used to
organize records into a tree, of groups of records, per join key so that
lookups are faster?
I also have been looking at RocksDB docs and it looks like it has a
RangeScan operation. I'm guessing then
I'm reading your response as rocksdb having to seek across the whole
dataset for the whole table, which we hope to avoid.
What are the rules for the unique key and unique join key inference? Maybe
we can reorganize our plan to allow it to infer unique keys more correctly.
Thanks
On Wed, Nov 18,
Yes, exactly. The rocksdb has to "seek" data sets because it doesn't know
how many entries are under the join key.
On Thu, 19 Nov 2020 at 13:38, Rex Fenley wrote:
> Ok, but if there is only 1 row per Join key on either side of the join,
> then wouldn't "iterate all the values in the MapState und
Ok, but if there is only 1 row per Join key on either side of the join,
then wouldn't "iterate all the values in the MapState under the current
key" effectively be "iterate 1 value in MapState under the current key"
which would be O(1)? Or are you saying that it must seek across the entire
dataset
Actually, if there is no unique key, it's not O(1), because there maybe
multiple rows are joined by the join key, i.e. iterate all the values in
the MapState under the current key, this is a "seek" operation on rocksdb
which is not efficient.
Are you asking where the join key is set? The join key
Thanks for the info.
So even if there is no unique key inferred for a Row, the set of rows to
join on each Join key should effectively still be an O(1) lookup if the
join key is unique right?
Also, I've been digging around the code to find where the lookup of rows
for a join key happens and haven
Hi Rex,
Currently, the join operator may use 3 kinds of state structure depending
on the input key and join key information.
1) input doesn't have a unique key => MapState,
where the map key is the input row and the map value is the number of equal
rows.
2) input has unique key, but the unique k
Ok, what are the performance consequences then of having a join with
NoUniqueKey if the left side's key actually is unique in practice?
Thanks!
On Tue, Nov 17, 2020 at 7:35 AM Jark Wu wrote:
> Hi Rex,
>
> Currently, the unique key is inferred by the optimizer. However, the
> inference is not p
Hi Rex,
Currently, the unique key is inferred by the optimizer. However, the
inference is not perfect.
There are known issues that the unique key is not derived correctly, e.g.
FLINK-20036 (is this opened by you?). If you think you have the same case,
please open an issue.
Query hint is a nice wa
Hello,
I have quite a few joins in my plan that have
leftInputSpec=[NoUniqueKey]
in Flink UI. I know this can't truly be the case that there is no unique
key, at least for some of these joins that I've evaluated.
Is there a way to hint to the join what the unique key is for a table?
Thanks!
-
11 matches
Mail list logo