On Sat, May 23, 2020 at 12:00 AM Robert Haas <robertmh...@gmail.com> wrote: > > On Tue, May 19, 2020 at 10:23 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > Good experiment. IIRC, we have discussed a similar idea during the > > development of this feature but we haven't seen any better results by > > allocating in ranges on the systems we have tried. So, we want with > > the current approach which is more granular and seems to allow better > > parallelism. I feel we need to ensure that we don't regress > > parallelism in existing cases, otherwise, the idea sounds promising to > > me. > > I think there's a significant difference. The idea I remember being > discussed at the time was to divide the relation into equal parts at > the very start and give one part to each worker. >
I have checked the archives and found that we have done some testing by allowing each worker to work on a block-by-block basis and by having a fixed number of chunks for each worker. See the results [1] (the program used is attached in another email [2]). The conclusion was that we didn't find much difference with any of those approaches. Now, the reason could be that because we have tested on a machine (I think it was hydra (Power-7)) where the chunk-size doesn't matter but I think it can show some difference in the machines on which Thomas and David are testing. At that time there was also a discussion to chunk on the basis of "each worker processes one 1GB-sized segment" which Tom and Stephen seem to support [3]. I think an idea to divide the relation into segments based on workers for a parallel scan has been used by other database (DynamoDB) as well [4] so it is not completely without merit. I understand that larger sized chunks can lead to unequal work distribution but they have their own advantages, so we might want to get the best of both the worlds where in the beginning we have larger sized chunks and then slowly reduce the chunk-size towards the end of the scan. I am not sure what is the best thing to do here but maybe some experiments can shed light on this mystery. [1] - https://www.postgresql.org/message-id/CAA4eK1JHCmN2X1LjQ4bOmLApt%2BbtOuid5Vqqk5G6dDFV69iyHg%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1JyVNEBE8KuxKd3bJhkG6tSbpBYX_%2BZtP34ZSTCSucA1A%40mail.gmail.com [3] - https://www.postgresql.org/message-id/30549.1422459647%40sss.pgh.pa.us [4] - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com