> total ForeignScan diff > 0 workers: 17584.319 ms 17555.904 ms 28.415 ms > 1 workers: 18464.476 ms 18110.968 ms 353.508 ms > 2 workers: 19042.755 ms 14580.335 ms 4462.420 ms > 3 workers: 19318.254 ms 12668.912 ms 6649.342 ms > 4 workers: 21732.910 ms 13596.788 ms 8136.122 ms > 5 workers: 23486.846 ms 14533.409 ms 8953.437 ms > > This workstation has 4 CPU cores, so it is natural nworkers=3 records the > peak performance on ForeignScan portion. On the other hands, nworkers>1 also > recorded unignorable time consumption (probably, by Gather node?) : > Further investigation will need.... > It was a bug of my file_fdw patch. ForeignScan node in the master process was also kicked by the Gather node, however, it didn't have coordinate information due to oversight of the initialization at InitializeDSMForeignScan callback. In the result, local ForeignScan node is still executed after the completion of coordinated background worker processes, and returned twice amount of rows.
In the revised patch, results seems to me reasonable. total ForeignScan diff 0 workers: 17592.498 ms 17564.457 ms 28.041ms 1 workers: 12152.998 ms 11983.485 ms 169.513 ms 2 workers: 10647.858 ms 10502.100 ms 145.758 ms 3 workers: 9635.445 ms 9509.899 ms 125.546 ms 4 workers: 11175.456 ms 10863.293 ms 312.163 ms 5 workers: 12586.457 ms 12279.323 ms 307.134 ms Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Friday, January 29, 2016 8:51 AM > To: Robert Haas > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CustomScan under the Gather node? > > > On Thu, Jan 28, 2016 at 10:50 AM, Kouhei Kaigai <kai...@ak.jp.nec.com> > > wrote: > > >> If I would make a proof-of-concept patch with interface itself, it > > >> seems to me file_fdw may be a good candidate for this enhancement. > > >> It is not a field for postgres_fdw. > > >> > > > The attached patch is enhancement of FDW/CSP interface and PoC feature > > > of file_fdw to scan source file partially. It was smaller enhancement > > > than my expectations. > > > > > > It works as follows. This query tried to read 20M rows from a CSV file, > > > using 3 background worker processes. > > > > > > postgres=# set max_parallel_degree = 3; > > > SET > > > postgres=# explain analyze select * from test_csv where id % 20 = 6; > > > QUERY PLAN > > > > > > ---------------------------------------------------------------------------- > > ---- > > > Gather (cost=1000.00..194108.60 rows=94056 width=52) > > > (actual time=0.570..19268.010 rows=2000000 loops=1) > > > Number of Workers: 3 > > > -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 > > width=52) > > > (actual time=0.180..12744.655 > rows=500000 > > loops=4) > > > Filter: ((id % 20) = 6) > > > Rows Removed by Filter: 9500000 > > > Foreign File: /tmp/testdata.csv > > > Foreign File Size: 1504892535 > > > Planning time: 0.147 ms > > > Execution time: 19330.201 ms > > > (9 rows) > > > > Could you try it not in parallel and then with 1, 2, 3, and 4 workers > > and post the times for all? > > > The above query has 5% selectivity on the entire CSV file. > Its execution time (total, only ForeignScan) are below > > total ForeignScan diff > 0 workers: 17584.319 ms 17555.904 ms 28.415 ms > 1 workers: 18464.476 ms 18110.968 ms 353.508 ms > 2 workers: 19042.755 ms 14580.335 ms 4462.420 ms > 3 workers: 19318.254 ms 12668.912 ms 6649.342 ms > 4 workers: 21732.910 ms 13596.788 ms 8136.122 ms > 5 workers: 23486.846 ms 14533.409 ms 8953.437 ms > > This workstation has 4 CPU cores, so it is natural nworkers=3 records the > peak performance on ForeignScan portion. On the other hands, nworkers>1 also > recorded unignorable time consumption (probably, by Gather node?) > > An interesting observation was, less selectivity (1% and 0%) didn't change the > result so much. Something consumes CPU time other than file_fdw. > > * selectivity 1% > total ForeignScan diff > 0 workers: 17573.572 ms 17566.875 ms 6.697 ms > 1 workers: 18098.070 ms 18020.790 ms 77.280 ms > 2 workers: 18676.078 ms 14600.749 ms 4075.329 ms > 3 workers: 18830.597 ms 12731.459 ms 6099.138 ms > 4 workers: 21015.842 ms 13590.657 ms 7425.185 ms > 5 workers: 22865.496 ms 14634.342 ms 8231.154 ms > > * selectivity 0% (...so Gather didn't work hard actually) > total ForeignScan diff > 0 workers: 17551.011 ms 17550.811 ms 0.200 ms > 1 workers: 18055.185 ms 18048.975 ms 6.210 ms > 2 workers: 18567.660 ms 14593.974 ms 3973.686 ms > 3 workers: 18649.819 ms 12671.429 ms 5978.390 ms > 4 workers: 20619.184 ms 13606.715 ms 7012.469 ms > 5 workers: 22557.575 ms 14594.420 ms 7963.155 ms > > Further investigation will need.... > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kai...@ak.jp.nec.com>
pgsql-v9.6-parallel-cspfdw.v2.patch
Description: pgsql-v9.6-parallel-cspfdw.v2.patch
postgres=# explain analyze select * from test_csv where id % 20 = 6; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Foreign Scan on test_csv (cost=0.00..2158874.49 rows=94056 width=52) (actual time=0.079..17564.457 rows=1000000 loops=1) Filter: ((id % 20) = 6) Rows Removed by Filter: 19000000 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.598 ms Execution time: 17592.498 ms (7 rows) postgres=# SET max_parallel_degree = 1; SET postgres=# explain analyze select * from test_csv where id % 20 = 6; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..194108.60 rows=94056 width=52) (actual time=0.237..12123.367 rows=1000000 loops=1) Number of Workers: 1 -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52) (actual time=0.469..11983.485 rows=500000 loops=2) Filter: ((id % 20) = 6) Rows Removed by Filter: 9500000 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.053 ms Execution time: 12152.998 ms (9 rows) postgres=# SET max_parallel_degree = 2; SET postgres=# explain analyze select * from test_csv where id % 20 = 6; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..194108.60 rows=94056 width=52) (actual time=0.548..10616.097 rows=1000000 loops=1) Number of Workers: 2 -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52) (actual time=1.803..10502.100 rows=333333 loops=3) Filter: ((id % 20) = 6) Rows Removed by Filter: 6333333 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.120 ms Execution time: 10647.858 ms (9 rows) postgres=# SET max_parallel_degree = 3; SET postgres=# explain analyze select * from test_csv where id % 20 = 6; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..194108.60 rows=94056 width=52) (actual time=0.184..9603.485 rows=1000000 loops=1) Number of Workers: 3 -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52) (actual time=0.731..9509.899 rows=250000 loops=4) Filter: ((id % 20) = 6) Rows Removed by Filter: 4750000 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.049 ms Execution time: 9635.445 ms (9 rows) postgres=# SET max_parallel_degree = 4; SET postgres=# explain analyze select * from test_csv where id % 20 = 6; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..194108.60 rows=94056 width=52) (actual time=0.619..11143.383 rows=1000000 loops=1) Number of Workers: 4 -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52) (actual time=15.336..10863.293 rows=200000 loops=5) Filter: ((id % 20) = 6) Rows Removed by Filter: 3800000 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.136 ms Execution time: 11175.456 ms (9 rows) postgres=# SET max_parallel_degree = 5; SET postgres=# explain analyze select * from test_csv where id % 20 = 6; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..194108.60 rows=94056 width=52) (actual time=0.674..12544.940 rows=1000000 loops=1) Number of Workers: 5 -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52) (actual time=23.652..12279.323 rows=166667 loops=6) Filter: ((id % 20) = 6) Rows Removed by Filter: 3166667 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.139 ms Execution time: 12586.457 ms (9 rows)
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers