Re: Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-28 Thread Benchao Li
Thanks Timo for preparing the FLIP. Regarding "By default, DISTRIBUTED BY assumes a list of columns for an implicit hash partitioning." Do you think it's useful to add some extensibility for the hash strategy. One scenario I can foresee is if we write bucketed data into Hive, and if Flink's hash s

RE: Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause

2023-10-27 Thread yunfan zhang
Distribute by in DML is also supported by Hive. And it is also useful for flink. Users can use this ability to increase cache hit rate in lookup join. And users can use "distribute by key, rand(1, 10)” to avoid data skew problem. And I think it is another way to solve this Flip204[1] There is alrea