Re: Random expr in join key not support

Yingyi Bu Tue, 19 Oct 2021 18:55:32 -0700

Per SQL spec, I think your join query can only be run as a NestedLoopJoin
or CartesianProduct.  See page 241 in SQL-99 (
http://web.cecs.pdx.edu/~len/sql1999.pdf).
In other words, it might be a correctness bug in other systems if they run
your query as a hash join.


> Here the purpose of adding a random in join key is to resolve the data
skew problem.

For that, you can add a table subquery and do it in the select list.

Best,
Yingyi


On Tue, Oct 19, 2021 at 12:46 AM Lantao Jin <jinlan...@gmail.com> wrote:

> In PostgreSQL and Presto, the below query works well
> sql> create table t1 (id int);
> sql> create table t2 (id int);
> sql> select * from t1 join t2 on t1.id = floor(random() * 9) + t2.id;
>
> But it throws "Error in query: nondeterministic expressions are only
> allowed in Project, Filter, Aggregate or Window". Why Spark doesn't support
> random expressions in join condition?
> Here the purpose to add a random in join key is to resolve the data skew
> problem.
>
> Thanks,
> Lantao
>

Re: Random expr in join key not support

Reply via email to