Re: Parallel Inserts in CREATE TABLE AS

Bharath Rupireddy Fri, 22 Jan 2021 04:38:40 -0800

On Fri, Jan 22, 2021 at 5:16 PM Tang, Haiying <tanghy.f...@cn.fujitsu.com>
wrote:
>
> Hi Bharath,
>
> I'm trying to take some performance measurements on you patch v23.
> But when I started, I found an issue about the tuples unbalance
distribution among workers(99% tuples read by one worker) under specified
case which lead the "parallel select" part makes no performance gain.
> Then I find it's not introduced by your patch, because it's also
happening in master(HEAD). But I don't know how to deal with it , so I put
it here to see if anybody know what's going wrong with this or have good
ideas to deal this issue.
>
> Here are the conditions to produce the issue:
> 1. high CPU spec environment(say above 20 processors). In smaller CPU, it
also happen but not so obvious(40% tuples on one worker in my tests).
> 2. query plan is "serial insert + parallel select", I have reproduce this
behavior in (CTAS, Select into, insert into select).
> 3. select part needs to query large data size(e.g. query 100 million from
200 million).
>
> According to above, IMHO, I guess it may be caused by the leader write
rate can't catch the worker read rate, then the tuples of one worker
blocked in the queue, become more and more.
>
> Below is my test info:
> 1. test spec environment
>   CentOS 8.2, 128G RAM, 40 processors, disk SAS
>
> 2. test data prepare
>   create table x(a int, b int, c int);
>   create index on x(a);
>   insert into x select
generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);
>
> 3. test execute results
>   *Patched CTAS*: please look at worker 2, 99% tuples read by it.
>   explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
>                                                                 QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>  Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual
time=0.203..24023.686 rows=100006268 loops=1)
>    Output: a, floor(((random() * '10000'::double precision) + '1'::double
precision)), c
>    Workers Planned: 4
>    Workers Launched: 4
>    ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000
width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
>          Output: a, c
>          Filter: ((x.b % 2) = 0)
>          Rows Removed by Filter: 19998746
>          Worker 0:  actual time=0.016..19.265 rows=94592 loops=1
>          Worker 1:  actual time=0.027..31.422 rows=94574 loops=1
>          Worker 2:  actual time=0.014..21744.549 rows=99627749 loops=1
>          Worker 3:  actual time=0.015..19.347 rows=94586 loops=1
 Planning Time: 0.098 ms  Execution Time: 91054.828 ms
>
>   *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
>   explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
>                                                                 QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>    Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual
time=0.283..19216.157 rows=100003148 loops=1)
>    Output: a, floor(((random() * '10000'::double precision) + '1'::double
precision)), c
>    Workers Planned: 4
>    Workers Launched: 4
>    ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000
width=8) (actual time=0.020..4380.360 rows=20000630 loops=5)
>          Output: a, c
>          Filter: ((x.b % 2) = 0)
>          Rows Removed by Filter: 19999370
>          Worker 0:  actual time=0.013..21805.647 rows=99624833 loops=1
>          Worker 1:  actual time=0.016..19.790 rows=94398 loops=1
>          Worker 2:  actual time=0.013..35.340 rows=94423 loops=1
>          Worker 3:  actual time=0.035..19.849 rows=94679 loops=1
 Planning Time: 0.083 ms  Execution Time: 91151.097 ms
>
> I'm still working on the performance tests on your patch, if I make some
progress, I will post my results here.


Thanks a lot for the tests. In your test case, parallel insertions are not
being picked because the Gather node has some projections(floor(((random()
* '10000'::double precision) + '1'::double precision)) to perform. That's
expected. Whenever parallel insertions are chosen for CTAS, we should see
"Create target_table '' under Gather node [1] and also the actual row count
for Gather node 0 (but in your test it is rows=100006268) in the explain
analyze output. Coming to your test case, if it's modified to something
like [1], where the Gather node has no projections, then parallel
insertions will be chosen.

[1] - I did this test on my development system, I will run on some
performance system and post my observations.
postgres=# explain (analyze, verbose) create table test(a,b,c) as select
a,b,c from x where b%2=0;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..3846.71 rows=1000 width=12) (actual
time=5581.308..5581.379 rows=0 loops=1)
   Output: a, b, c
   Workers Planned: 1
   Workers Launched: 1
* ->  Create test*
   ->  Parallel Seq Scan on public.x  (cost=0.00..2846.71 rows=588
width=12) (actual time=0.014..29.512 rows=50023 loops=2)
         Output: a, b, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 49977
         Worker 0:  actual time=0.015..29.751 rows=49419 loops=1
 Planning Time: 1574.584 ms
 Execution Time: 6437.562 ms
(12 rows)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

>

Re: Parallel Inserts in CREATE TABLE AS

Reply via email to