On Mon, Jan 19, 2015 at 6:50 PM, Robert Haas <robertmh...@gmail.com> wrote: > > On Mon, Jan 19, 2015 at 2:24 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > Another thing is that I think prefetching is not supported on all platforms > > (Windows) and for such systems as per above algorithm we need to > > rely on block-by-block method. > > Well, I think we should try to set up a test to see if this is hurting > us. First, do a sequential-scan of a related too big at least twice > as large as RAM. Then, do a parallel sequential scan of the same > relation with 2 workers. Repeat these in alternation several times. > If the operating system is accomplishing meaningful readahead, and the > parallel sequential scan is breaking it, then since the test is > I/O-bound I would expect to see the parallel scan actually being > slower than the normal way. >
I have taken some performance data as per above discussion. Basically, I have used parallel_count module which is part of parallel-mode patch as that seems to be more close to verify the I/O pattern (doesn't have any tuple communication overhead). Script used to test is attached (parallel_count.sh) Performance Data ---------------------------- Configuration and Db Details IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB Table Size - 120GB Used below statements to create table - create table tbl_perf(c1 int, c2 char(1000)); insert into tbl_perf values(generate_series(1,10000000),'aaaaa'); insert into tbl_perf values(generate_series(10000001,30000000),'aaaaa'); insert into tbl_perf values(generate_series(30000001,110000000),'aaaaa'); *Block-By-Block* *No. of workers/Time (ms)* *0* *2* Run-1 267798 295051 Run-2 276646 296665 Run-3 281364 314952 Run-4 290231 326243 Run-5 288890 295684 Then I have modified the parallel_count module such that it can scan in fixed chunks, rather than block-by-block, the patch for same is attached (parallel_count_fixed_chunk_v1.patch, this is a patch based on parallel count module in parallel-mode patch [1]). *Fixed-Chunks* *No. of workers/Time (ms)* *0* *2* 286346 234037 250051 215111 255915 254934 263754 242228 251399 202581 Observations ------------------------ 1. Scanning block-by-block has negative impact on performance and I thin it will degrade more if we increase parallel count as that can lead to more randomness. 2. Scanning in fixed chunks improves the performance. Increasing parallel count to a very large number might impact the performance, but I think we can have a lower bound below which we will not allow multiple processes to scan the relation. Now I can go-ahead and try with prefetching approach as suggested by you, but I have a feeling that overall it might not be beneficial (mainly due to the reason that it is not supported on all platforms, we can say that we don't care for such platforms, but still there is no mitigation strategy for those platforms) due to the reasons mentioned up-thread. Thoughts? [1] http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
parallel_count.sh
Description: Bourne shell script
parallel_count_fixed_chunk_v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers