Re: variation of row_number with parallel

Rajkumar Raghuwanshi Mon, 13 Apr 2020 21:19:50 -0700

On Tue, Apr 14, 2020 at 9:39 AM Pavel Stehule <pavel.steh...@gmail.com>
wrote:


>
>
> út 14. 4. 2020 v 5:59 odesílatel Rajkumar Raghuwanshi <
> rajkumar.raghuwan...@enterprisedb.com> napsal:
>
>> Hi,
>>
>> I have observed row_number() is giving different results when query
>> executed in parallel. is this expected w.r.t parallel execution.
>>
>> CREATE TABLE tbl1 (c1 INT) partition by list (c1);
>> CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
>> CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
>> CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);
>>
>> CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
>> CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
>> CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
>> CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
>> CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
>> CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);
>>
>> INSERT INTO tbl1 VALUES (10),(20),(30);
>>
>> INSERT INTO tbl2 VALUES
>> (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);
>>
>> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
>> where d.c1=e.c3;
>>                                       QUERY PLAN
>>
>>
>> ---------------------------------------------------------------------------------------
>>  WindowAgg  (cost=1520.35..12287.73 rows=390150 width=12)
>>    ->  Merge Join  (cost=1520.35..7410.85 rows=390150 width=4)
>>          Merge Cond: (d.c1 = e.c3)
>>          ->  Sort  (cost=638.22..657.35 rows=7650 width=4)
>>                Sort Key: d.c1
>>                ->  Append  (cost=0.00..144.75 rows=7650 width=4)
>>                      ->  Seq Scan on tbl1_p1 d_1  (cost=0.00..35.50
>> rows=2550 width=4)
>>                      ->  Seq Scan on tbl1_p2 d_2  (cost=0.00..35.50
>> rows=2550 width=4)
>>                      ->  Seq Scan on tbl1_p3 d_3  (cost=0.00..35.50
>> rows=2550 width=4)
>>          ->  Sort  (cost=882.13..907.63 rows=10200 width=8)
>>                Sort Key: e.c3
>>                ->  Append  (cost=0.00..203.00 rows=10200 width=8)
>>                      ->  Seq Scan on tbl2_p1 e_1  (cost=0.00..30.40
>> rows=2040 width=8)
>>                      ->  Seq Scan on tbl2_p2 e_2  (cost=0.00..30.40
>> rows=2040 width=8)
>>                      ->  Seq Scan on tbl2_p3 e_3  (cost=0.00..30.40
>> rows=2040 width=8)
>>                      ->  Seq Scan on tbl2_p4 e_4  (cost=0.00..30.40
>> rows=2040 width=8)
>>                      ->  Seq Scan on tbl2_p5 e_5  (cost=0.00..30.40
>> rows=2040 width=8)
>> (17 rows)
>>
>> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
>> d.c1=e.c3;
>>  c2  | row_number
>> -----+------------
>>  *200 |          1*
>>  100 |          2
>>  100 |          3
>>  100 |          4
>>  100 |          5
>> (5 rows)
>>
>> postgres=#
>> postgres=# set parallel_setup_cost = 0;
>> SET
>> postgres=# set parallel_tuple_cost = 0;
>> SET
>> postgres=#
>> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
>> where d.c1=e.c3;
>>                                               QUERY PLAN
>>
>>
>> ------------------------------------------------------------------------------------------------------
>>  WindowAgg  (cost=130.75..7521.21 rows=390150 width=12)
>>    ->  Gather  (cost=130.75..2644.34 rows=390150 width=4)
>>          Workers Planned: 2
>>          ->  Parallel Hash Join  (cost=130.75..2644.34 rows=162562
>> width=4)
>>                Hash Cond: (e.c3 = d.c1)
>>                ->  Parallel Append  (cost=0.00..131.25 rows=4250 width=8)
>>                      ->  Parallel Seq Scan on tbl2_p1 e_1
>>  (cost=0.00..22.00 rows=1200 width=8)
>>                      ->  Parallel Seq Scan on tbl2_p2 e_2
>>  (cost=0.00..22.00 rows=1200 width=8)
>>                      ->  Parallel Seq Scan on tbl2_p3 e_3
>>  (cost=0.00..22.00 rows=1200 width=8)
>>                      ->  Parallel Seq Scan on tbl2_p4 e_4
>>  (cost=0.00..22.00 rows=1200 width=8)
>>                      ->  Parallel Seq Scan on tbl2_p5 e_5
>>  (cost=0.00..22.00 rows=1200 width=8)
>>                ->  Parallel Hash  (cost=90.93..90.93 rows=3186 width=4)
>>                      ->  Parallel Append  (cost=0.00..90.93 rows=3186
>> width=4)
>>                            ->  Parallel Seq Scan on tbl1_p1 d_1
>>  (cost=0.00..25.00 rows=1500 width=4)
>>                            ->  Parallel Seq Scan on tbl1_p2 d_2
>>  (cost=0.00..25.00 rows=1500 width=4)
>>                            ->  Parallel Seq Scan on tbl1_p3 d_3
>>  (cost=0.00..25.00 rows=1500 width=4)
>> (16 rows)
>>
>> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
>> d.c1=e.c3;
>>  c2  | row_number
>> -----+------------
>>  100 |          1
>>  100 |          2
>>  100 |          3
>>  *200 |          4*
>>  100 |          5
>> (5 rows)
>>
>
> there are not ORDER BY clause, so order is not defined - paralel hash join
> surely doesn't ensure a order.
> I think so this behave is expected.
>
 thanks.

>
> Regards
>
> Pavel
>
>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>

Re: variation of row_number with parallel

Reply via email to