On Thu, Nov 28, 2019 at 7:19 PM Jinbao Chen wrote:
> Hi Andy,
>
> I just test the query on 12.1. But pg use big_table as inner.
>
> demo=# explain (costs off) select * from t_small, t_big where a = b;
> QUERY PLAN
>
> Hash Join
>Hash Cond: (t
Hi Andy,
I just test the query on 12.1. But pg use big_table as inner.
demo=# explain (costs off) select * from t_small, t_big where a = b;
QUERY PLAN
Hash Join
Hash Cond: (t_small.a = t_big.b)
-> Seq Scan on t_small
-> Hash
-
On Fri, Nov 22, 2019 at 6:51 PM Jinbao Chen wrote:
> Hi hackers,
>
> I have made a patch to fix the problem.
>
> Added the selection rate of the inner table non-empty bucket
>
> The planner will use big table as inner table in hash join
> if small table have fewer unique values. But this plan is
Hi hackers,
I have made a patch to fix the problem.
Added the selection rate of the inner table non-empty bucket
The planner will use big table as inner table in hash join
if small table have fewer unique values. But this plan is
much slower than using small table as inner table.
In general, th
I think we have the same understanding of this issue.
Sometimes use smaller costs on scanning the chain in bucket like below would
be better.
run_cost += outer_path_rows * some_small_probe_cost;
run_cost += hash_qual_cost.per_tuple * approximate_tuple_count();
In some version of GreenPlum(a databa
On Mon, Nov 18, 2019 at 7:48 PM Jinbao Chen wrote:
> In the test case above, the small table has 3000 tuples and 100 distinct
> values on column ‘a’.
> If we use small table as inner table. The chan length of the bucket is 30.
> And we need to
> search the whole chain on probing the hash table.
Hi Hackers,
The planner will use big table as inner table in hash join if small table
have fewer unique values.
But this plan is much slower than using small table as inner table. This
problem occurs on master
branch without parallel scan.
For example
create table t_small(a int);
create table t_