RE: Parallel Seq Scan vs kernel read ahead

k.jami...@fujitsu.com Fri, 17 Jul 2020 05:44:24 -0700

On Wednesday, July 15, 2020 12:52 PM (GMT+9), David Rowley wrote:

>On Wed, 15 Jul 2020 at 14:51, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>
>> On Wed, Jul 15, 2020 at 5:55 AM David Rowley <dgrowle...@gmail.com> wrote:
>>> If we've not seen any performance regressions within 1 week, then I 
>>> propose that we (pending final review) push this to allow wider 
>>> testing.
>>
>> I think Soumyadeep has reported a regression case [1] with the earlier 
>> version of the patch.  I am not sure if we have verified that the 
>> situation improves with the latest version of the patch.  I request 
>> Soumyadeep to please try once with the latest patch.
>...
>Yeah, it would be good to see some more data points on that test.
>Jumping from 2 up to 6 workers just leaves us to guess where the performance
>started to become bad. >It would be good to know if the regression is
>repeatable or if it was affected by some other process.
>...
>It would be good to see EXPLAIN (ANALYZE, BUFFERS) with SET track_io_timing = 
>on;
>for each value of >max_parallel_workers.

Hi,

If I'm following the thread correctly, we may have gains on this patch
of Thomas and David, but we need to test its effects on different
filesystems. It's also been clarified by David through benchmark tests
that synchronize_seqscans is not affected as long as the set cap per
chunk size of parallel scan is at 8192.

I also agree that having a control on this through GUC can be
beneficial for users, however, that can be discussed in another
thread or development in the future.

David Rowley wrote:
>I'd like to propose that if anyone wants to do further testing on
>other operating systems with SSDs or HDDs then it would be good if
>that could be done within a 1 week from this email. There are various
>benchmarking ideas on this thread for inspiration.

I'd like to join on testing it, this one using HDD, and at the bottom
are the results. Due to my machine limitations, I only tested
0~6 workers, that even if I increase max_parallel_workers_per_gather
more than that, the query planner would still cap the workers at 6.
I also set the track_io_timing to on as per David's recommendation.

Tested on:
XFS filesystem, HDD virtual machine
RHEL4, 64-bit,
4 CPUs, Intel Core Processor (Haswell, IBRS)
PostgreSQL 14devel on x86_64-pc-linux-gnu

----Test Case (Soumyadeep's) [1]

shared_buffers = 32MB (to use OS page cache)

create table t_heap as select generate_series(1, 100000000) i;   --about 3.4GB 
size

SET track_io_timing = on;

\timing

set max_parallel_workers_per_gather = 0;      --0 to 6

SELECT count(*) from t_heap;
EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) from t_heap;

[Summary]
I used the same query from the thread. However, the sql query execution time
and query planner execution time results between the master and patched do
not vary much.
OTOH, in terms of I/O stats, I observed similar regression in both master
and patched as we increase max_parallel_workers_per_gather.

It could also be possible that each benchmark setting for 
max_parallel_workers_per_gather
is affected by previous result . IOW, later benchmark runs benefit from the 
data cached by
previous runs on OS level. 
Any advice? Please refer to tables below for results.

(MASTER/UNPATCHED)
| Parallel Workers | SQLExecTime  |  PlannerExecTime |  Buffers                 
   | 
|------------------|--------------|------------------|-----------------------------|

| 0                | 12942.606 ms | 37031.786 ms     | shared hit=32 
read=442446   | 
| 1                |  4959.567 ms | 17601.813 ms     | shared hit=128 
read=442350  | 
| 2                |  3273.610 ms | 11766.441 ms     | shared hit=288 
read=442190  | 
| 3                |  2449.342 ms |  9057.236 ms     | shared hit=512 
read=441966  | 
| 4                |  2482.404 ms |  8853.702 ms     | shared hit=800 
read=441678  | 
| 5                |  2430.944 ms |  8777.630 ms     | shared hit=1152 
read=441326 | 
| 6                |  2493.416 ms |  8798.200 ms     | shared hit=1568 
read=440910 | 

(PATCHED V2)
| Parallel Workers | SQLExecTime |  PlannerExecTime |  Buffers                  
  | 
|------------------|-------------|------------------|-----------------------------|

| 0                | 9283.193 ms | 34471.050 ms     | shared hit=2624 
read=439854 | 
| 1                | 4872.728 ms | 17449.725 ms     | shared hit=2528 
read=439950 | 
| 2                | 3240.301 ms | 11556.243 ms     | shared hit=2368 
read=440110 | 
| 3                | 2419.512 ms |  8709.572 ms     | shared hit=2144 
read=440334 | 
| 4                | 2746.820 ms |  8768.812 ms     | shared hit=1856 
read=440622 | 
| 5                | 2424.687 ms |  8699.762 ms     | shared hit=1504 
read=440974 | 
| 6                | 2581.999 ms |  8627.627 ms     | shared hit=1440 
read=441038 | 

(I/O Read Stat)
| Parallel Workers | I/O (Master)  | I/O (Patched) | 
|------------------|---------------|---------------| 
| 0                | read=1850.233 | read=1071.209 | 
| 1                | read=1246.939 | read=1115.361 | 
| 2                | read=1079.837 | read=1090.425 | 
| 3                | read=1342.133 | read=1094.115 | 
| 4                | read=1478.821 | read=1355.966 | 
| 5                | read=1691.244 | read=1679.446 | 
| 6                | read=1952.384 | read=1881.733 | 

I hope this helps in a way.

Regards,
Kirk Jamison

[1] 
https://www.postgresql.org/message-id/CADwEdoqirzK3H8bB=xyj192ezcnwgfcca_wj5ghvm7sv8oe...@mail.gmail.com

RE: Parallel Seq Scan vs kernel read ahead

Reply via email to