Thanks Jing,
Looks good~
Best,
Jingsong
On Fri, Jan 21, 2022 at 2:00 PM Lincoln Lee wrote:
>
> Hi, Jing
> Glad to hear the agreement on the hint syntax, let's keep going!
>
> Best,
> Lincoln Lee
>
>
> Jing Zhang 于2022年1月20日周四 16:52写道:
>
> > Hi Jingsong,
> > Thanks for the feedback.
> >
> > >
Hi, Jing
Glad to hear the agreement on the hint syntax, let's keep going!
Best,
Lincoln Lee
Jing Zhang 于2022年1月20日周四 16:52写道:
> Hi Jingsong,
> Thanks for the feedback.
>
> > Is there a conclusion about naming here? (Maybe I missed something?)
> Use USE_HASH or some other names. Slightly conf
Hi Jingsong,
Thanks for the feedback.
> Is there a conclusion about naming here? (Maybe I missed something?)
Use USE_HASH or some other names. Slightly confusing in the FLIP.
'SHUFFLE_HASH' is final hint name, 'USE_HASH' is rejected. I've updated the
FLIP.
> And the problem of what to write insi
Hi Lincoln,
Thanks for the feedback.
> The previous discussion for the hint syntax
left a minor difference there: whether to use both sides of join table
names or just one 'build' side table name only. I would prefer the later
one.
Users only need to pass the `build` side table(usually the smalle
Hi Jing,
About `SHUFFLE_HASH(left_table, right_table)`, one case can be shared:
SELECT * FROM left_t
JOIN right_1 ON ...
JOIN right_2 ON ...
JOIN right_3 ON ...
What if we want to use shuffle_hash for all three joints?
SELECT /*+ SHUFFLE_HASH('left_t', 'right_1', 'right_2', 'right_3') */
Hi Jing,
Sorry for the late reply!
Is there a conclusion about naming here? (Maybe I missed something?)
Use USE_HASH or some other names. Slightly confusing in the FLIP.
And the problem of what to write inside the hint, as mentioned by lincoln.
I think maybe we can list the grammars of other di
Hi, Jing,
Sorry for the late reply! The previous discussion for the hint syntax
left a minor difference there: whether to use both sides of join table
names or just one 'build' side table name only. I would prefer the later
one.
Users only need to pass the `build` side table(usually the smalle
Hi all,
Thanks for all the feedback so far.
If there is no more suggestions, I would like to drive a vote in Tuesday
next week (18 Jan).
Best,
Jing Zhang
Jing Zhang 于2022年1月5日周三 11:33写道:
> Hi Francesco,
> Thanks a lot for the feedback.
>
> > does it makes sense for a lookup join to use hash dis
Hi Francesco,
Thanks a lot for the feedback.
> does it makes sense for a lookup join to use hash distribution whenever
is possible by default?
I prefer to enable the hash lookup join only find the hint in the query for
the following reason:
1. Plan compatibility
Add a hash shuffle by default w
Hi Jing,
Thanks for the FLIP. I'm not very knowledgeable about the topic, but going
through both the FLIP and the discussion here, I wonder, does it makes
sense for a lookup join to use hash distribution whenever is possible by
default?
The point you're explaining here:
> Many Lookup table sourc
Hi Lincoln,
Thanks for the feedback.
> 1. For the hint name, +1 for WenLong's proposal.
I've added add 'SHUFFLE_HASH' to other alternatives in FLIP. Let's waiting
for more voices here.
> Regarding the `SKEW` hint, agree with you that it can be used widely, and
I
prefer to treat it as a metadata
Hi Martijn,
Thanks for the feedback.
Glad to hear that we reached a consensus on the first and second point.
About whether to use `use_hash` as a term, I think your concern makes sense.
Although the hash lookup join is similar to Hash join in oracle that they
all require hash distribution on inpu
Hi Jing,
Thanks for your explanation!
1. For the hint name, +1 for WenLong's proposal. I think the `SHUFFLE`
keyword is important in a classic distributed computing system,
a hash-join usually means there's a shuffle stage(include shuffle
hash-join, broadcast hash-join). Users only need to pas
Hi Jing,
Thanks for explaining this in more detail and also to others
participating.
> I think using query hints in this case is more natural for users, WDYT?
Yes, I agree. As long as we properly explain in our documentation that we
support both Query Hints and Table Hints, what's the difference
Hi Jian gang,
Thanks for the feedback.
> When it comes to hive, how do you load partial data instead of the
whole data? Any change related with hive?
The question is same as Yuan mentioned before.
I prefer to drive another FLIP on this topic to further discussion
individually because this poin
Hi Wenlong,
Thanks for the feedback.
I've checked similar syntax in other systems, they are all different from
each other. It seems to be without consensus.
As mentioned in FLIP-204, oracle uses a query hint, the hint name is
'use_hash' [1].
Spark also uses a query hint, its name is 'SHUFFLE_HASH'
Thank you for the proposal, Jing. I like the idea to partition data by some
key to improve the cache hit. I have some questions:
1. When it comes to hive, how do you load partial data instead of the
whole data? Any change related with hive?
2. How to define the cache configuration? For ex
Hi, Jing, thanks for driving the discussion.
Have you made some investigation on the syntax of join hint?
Why do you choose USE_HASH from oracle instead of the style of spark
SHUFFLE_HASH, they are quite different.
People in the big data world may be more familiar with spark/hive, if we
need to ch
18 matches
Mail list logo