Re: [HACKERS] CustomScan under the Gather node?

Kouhei Kaigai Thu, 28 Jan 2016 17:15:06 -0800

>              total         ForeignScan        diff
> 0 workers: 17584.319 ms   17555.904 ms      28.415 ms
> 1 workers: 18464.476 ms   18110.968 ms     353.508 ms
> 2 workers: 19042.755 ms   14580.335 ms    4462.420 ms
> 3 workers: 19318.254 ms   12668.912 ms    6649.342 ms
> 4 workers: 21732.910 ms   13596.788 ms    8136.122 ms
> 5 workers: 23486.846 ms   14533.409 ms    8953.437 ms
> 
> This workstation has 4 CPU cores, so it is natural nworkers=3 records the
> peak performance on ForeignScan portion. On the other hands, nworkers>1 also
> recorded unignorable time consumption (probably, by Gather node?)
  :
> Further investigation will need....
>
It was a bug of my file_fdw patch. ForeignScan node in the master process was
also kicked by the Gather node, however, it didn't have coordinate information
due to oversight of the initialization at InitializeDSMForeignScan callback.
In the result, local ForeignScan node is still executed after the completion
of coordinated background worker processes, and returned twice amount of rows.


In the revised patch, results seems to me reasonable.
             total         ForeignScan      diff
0 workers: 17592.498 ms   17564.457 ms     28.041ms
1 workers: 12152.998 ms   11983.485 ms    169.513 ms
2 workers: 10647.858 ms   10502.100 ms    145.758 ms
3 workers:  9635.445 ms    9509.899 ms    125.546 ms
4 workers: 11175.456 ms   10863.293 ms    312.163 ms
5 workers: 12586.457 ms   12279.323 ms    307.134 ms

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>


> -----Original Message-----
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Friday, January 29, 2016 8:51 AM
> To: Robert Haas
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] CustomScan under the Gather node?
> 
> > On Thu, Jan 28, 2016 at 10:50 AM, Kouhei Kaigai <kai...@ak.jp.nec.com> 
> > wrote:
> > >> If I would make a proof-of-concept patch with interface itself, it
> > >> seems to me file_fdw may be a good candidate for this enhancement.
> > >> It is not a field for postgres_fdw.
> > >>
> > > The attached patch is enhancement of FDW/CSP interface and PoC feature
> > > of file_fdw to scan source file partially. It was smaller enhancement
> > > than my expectations.
> > >
> > > It works as follows. This query tried to read 20M rows from a CSV file,
> > > using 3 background worker processes.
> > >
> > > postgres=# set max_parallel_degree = 3;
> > > SET
> > > postgres=# explain analyze select * from test_csv where id % 20 = 6;
> > >                                   QUERY PLAN
> > >
> >
> ----------------------------------------------------------------------------
> > ----
> > >  Gather  (cost=1000.00..194108.60 rows=94056 width=52)
> > >          (actual time=0.570..19268.010 rows=2000000 loops=1)
> > >    Number of Workers: 3
> > >    ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056
> > width=52)
> > >                                   (actual time=0.180..12744.655
> rows=500000
> > loops=4)
> > >          Filter: ((id % 20) = 6)
> > >          Rows Removed by Filter: 9500000
> > >          Foreign File: /tmp/testdata.csv
> > >          Foreign File Size: 1504892535
> > >  Planning time: 0.147 ms
> > >  Execution time: 19330.201 ms
> > > (9 rows)
> >
> > Could you try it not in parallel and then with 1, 2, 3, and 4 workers
> > and post the times for all?
> >
> The above query has 5% selectivity on the entire CSV file.
> Its execution time (total, only ForeignScan) are below
> 
>              total         ForeignScan        diff
> 0 workers: 17584.319 ms   17555.904 ms      28.415 ms
> 1 workers: 18464.476 ms   18110.968 ms     353.508 ms
> 2 workers: 19042.755 ms   14580.335 ms    4462.420 ms
> 3 workers: 19318.254 ms   12668.912 ms    6649.342 ms
> 4 workers: 21732.910 ms   13596.788 ms    8136.122 ms
> 5 workers: 23486.846 ms   14533.409 ms    8953.437 ms
> 
> This workstation has 4 CPU cores, so it is natural nworkers=3 records the
> peak performance on ForeignScan portion. On the other hands, nworkers>1 also
> recorded unignorable time consumption (probably, by Gather node?)
> 
> An interesting observation was, less selectivity (1% and 0%) didn't change the
> result so much. Something consumes CPU time other than file_fdw.
> 
> * selectivity 1%
>                total       ForeignScan       diff
> 0 workers: 17573.572 ms   17566.875 ms      6.697 ms
> 1 workers: 18098.070 ms   18020.790 ms     77.280 ms
> 2 workers: 18676.078 ms   14600.749 ms   4075.329 ms
> 3 workers: 18830.597 ms   12731.459 ms   6099.138 ms
> 4 workers: 21015.842 ms   13590.657 ms   7425.185 ms
> 5 workers: 22865.496 ms   14634.342 ms   8231.154 ms
> 
> * selectivity 0% (...so Gather didn't work hard actually)
>               total        ForeignScan       diff
> 0 workers: 17551.011 ms   17550.811 ms      0.200 ms
> 1 workers: 18055.185 ms   18048.975 ms      6.210 ms
> 2 workers: 18567.660 ms   14593.974 ms   3973.686 ms
> 3 workers: 18649.819 ms   12671.429 ms   5978.390 ms
> 4 workers: 20619.184 ms   13606.715 ms   7012.469 ms
> 5 workers: 22557.575 ms   14594.420 ms   7963.155 ms
> 
> Further investigation will need....
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kai...@ak.jp.nec.com>

pgsql-v9.6-parallel-cspfdw.v2.patch
Description: pgsql-v9.6-parallel-cspfdw.v2.patch

postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on test_csv  (cost=0.00..2158874.49 rows=94056 width=52) (actual 
time=0.079..17564.457 rows=1000000 loops=1)
   Filter: ((id % 20) = 6)
   Rows Removed by Filter: 19000000
   Foreign File: /tmp/testdata.csv
   Foreign File Size: 1504892535
 Planning time: 0.598 ms
 Execution time: 17592.498 ms
(7 rows)

postgres=# SET max_parallel_degree = 1;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.237..12123.367 rows=1000000 loops=1)
   Number of Workers: 1
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.469..11983.485 rows=500000 loops=2)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 9500000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.053 ms
 Execution time: 12152.998 ms
(9 rows)

postgres=# SET max_parallel_degree = 2;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.548..10616.097 rows=1000000 loops=1)
   Number of Workers: 2
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=1.803..10502.100 rows=333333 loops=3)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 6333333
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.120 ms
 Execution time: 10647.858 ms
(9 rows)

postgres=# SET max_parallel_degree = 3;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.184..9603.485 rows=1000000 loops=1)
   Number of Workers: 3
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.731..9509.899 rows=250000 loops=4)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 4750000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.049 ms
 Execution time: 9635.445 ms
(9 rows)

postgres=# SET max_parallel_degree = 4;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.619..11143.383 rows=1000000 loops=1)
   Number of Workers: 4
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=15.336..10863.293 rows=200000 loops=5)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 3800000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.136 ms
 Execution time: 11175.456 ms
(9 rows)

postgres=# SET max_parallel_degree = 5;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.674..12544.940 rows=1000000 loops=1)
   Number of Workers: 5
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=23.652..12279.323 rows=166667 loops=6)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 3166667
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.139 ms
 Execution time: 12586.457 ms
(9 rows)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CustomScan under the Gather node?

Reply via email to