Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-20 Thread Jingsong Li
Thanks Jing, Looks good~ Best, Jingsong On Fri, Jan 21, 2022 at 2:00 PM Lincoln Lee wrote: > > Hi, Jing > Glad to hear the agreement on the hint syntax, let's keep going! > > Best, > Lincoln Lee > > > Jing Zhang 于2022年1月20日周四 16:52写道: > > > Hi Jingsong, > > Thanks for the feedback. > > > > >

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-20 Thread Lincoln Lee
Hi, Jing Glad to hear the agreement on the hint syntax, let's keep going! Best, Lincoln Lee Jing Zhang 于2022年1月20日周四 16:52写道: > Hi Jingsong, > Thanks for the feedback. > > > Is there a conclusion about naming here? (Maybe I missed something?) > Use USE_HASH or some other names. Slightly conf

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-20 Thread Jing Zhang
Hi Jingsong, Thanks for the feedback. > Is there a conclusion about naming here? (Maybe I missed something?) Use USE_HASH or some other names. Slightly confusing in the FLIP. 'SHUFFLE_HASH' is final hint name, 'USE_HASH' is rejected. I've updated the FLIP. > And the problem of what to write insi

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-20 Thread Jing Zhang
Hi Lincoln, Thanks for the feedback. > The previous discussion for the hint syntax left a minor difference there: whether to use both sides of join table names or just one 'build' side table name only. I would prefer the later one. Users only need to pass the `build` side table(usually the smalle

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-19 Thread Jingsong Li
Hi Jing, About `SHUFFLE_HASH(left_table, right_table)`, one case can be shared: SELECT * FROM left_t JOIN right_1 ON ... JOIN right_2 ON ... JOIN right_3 ON ... What if we want to use shuffle_hash for all three joints? SELECT /*+ SHUFFLE_HASH('left_t', 'right_1', 'right_2', 'right_3') */

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-19 Thread Jingsong Li
Hi Jing, Sorry for the late reply! Is there a conclusion about naming here? (Maybe I missed something?) Use USE_HASH or some other names. Slightly confusing in the FLIP. And the problem of what to write inside the hint, as mentioned by lincoln. I think maybe we can list the grammars of other di

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-19 Thread Lincoln Lee
Hi, Jing, Sorry for the late reply! The previous discussion for the hint syntax left a minor difference there: whether to use both sides of join table names or just one 'build' side table name only. I would prefer the later one. Users only need to pass the `build` side table(usually the smalle

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-15 Thread Jing Zhang
Hi all, Thanks for all the feedback so far. If there is no more suggestions, I would like to drive a vote in Tuesday next week (18 Jan). Best, Jing Zhang Jing Zhang 于2022年1月5日周三 11:33写道: > Hi Francesco, > Thanks a lot for the feedback. > > > does it makes sense for a lookup join to use hash dis

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-04 Thread Jing Zhang
Hi Francesco, Thanks a lot for the feedback. > does it makes sense for a lookup join to use hash distribution whenever is possible by default? I prefer to enable the hash lookup join only find the hint in the query for the following reason: 1. Plan compatibility Add a hash shuffle by default w

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-03 Thread Francesco Guardiani
Hi Jing, Thanks for the FLIP. I'm not very knowledgeable about the topic, but going through both the FLIP and the discussion here, I wonder, does it makes sense for a lookup join to use hash distribution whenever is possible by default? The point you're explaining here: > Many Lookup table sourc

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-31 Thread Jing Zhang
Hi Lincoln, Thanks for the feedback. > 1. For the hint name, +1 for WenLong's proposal. I've added add 'SHUFFLE_HASH' to other alternatives in FLIP. Let's waiting for more voices here. > Regarding the `SKEW` hint, agree with you that it can be used widely, and I prefer to treat it as a metadata

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-31 Thread Jing Zhang
Hi Martijn, Thanks for the feedback. Glad to hear that we reached a consensus on the first and second point. About whether to use `use_hash` as a term, I think your concern makes sense. Although the hash lookup join is similar to Hash join in oracle that they all require hash distribution on inpu

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Lincoln Lee
Hi Jing, Thanks for your explanation! 1. For the hint name, +1 for WenLong's proposal. I think the `SHUFFLE` keyword is important in a classic distributed computing system, a hash-join usually means there's a shuffle stage(include shuffle hash-join, broadcast hash-join). Users only need to pas

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Martijn Visser
Hi Jing, Thanks for explaining this in more detail and also to others participating. > I think using query hints in this case is more natural for users, WDYT? Yes, I agree. As long as we properly explain in our documentation that we support both Query Hints and Table Hints, what's the difference

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Jing Zhang
Hi Jian gang, Thanks for the feedback. > When it comes to hive, how do you load partial data instead of the whole data? Any change related with hive? The question is same as Yuan mentioned before. I prefer to drive another FLIP on this topic to further discussion individually because this poin

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Jing Zhang
Hi Wenlong, Thanks for the feedback. I've checked similar syntax in other systems, they are all different from each other. It seems to be without consensus. As mentioned in FLIP-204, oracle uses a query hint, the hint name is 'use_hash' [1]. Spark also uses a query hint, its name is 'SHUFFLE_HASH'

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread 刘建刚
Thank you for the proposal, Jing. I like the idea to partition data by some key to improve the cache hit. I have some questions: 1. When it comes to hive, how do you load partial data instead of the whole data? Any change related with hive? 2. How to define the cache configuration? For ex

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread wenlong.lwl
Hi, Jing, thanks for driving the discussion. Have you made some investigation on the syntax of join hint? Why do you choose USE_HASH from oracle instead of the style of spark SHUFFLE_HASH, they are quite different. People in the big data world may be more familiar with spark/hive, if we need to ch