On Fri, May 28, 2021 at 8:53 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy > > > > I captured below information with the attached patch > > 0001-test-times-and-block-counts.patch applied on top of CTAS v23 > > patch set. Testing details are attached in the file named "test". > > Total time spent in LockRelationForExtension > > Total time spent in GetPageWithFreeSpace > > Total time spent in RelationAddExtraBlocks > > Total number of times extended the relation in bulk > > Total number of times extended the relation by one block > > Total number of blocks added in bulk extension > > Total number of times getting the page from FSM > > > > In your results, the number of pages each process is getting from FSM > is not matching with the number of blocks added. I think we need to > increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well > because that is also called and the process can get a free page via > the same. The other thing to check via debugger is when one worker > adds the blocks in bulk does another parallel worker gets all those > blocks. You can achieve that by allowing one worker (say worker-1) to > extend the relation in bulk and then let it wait and allow another > worker (say worker-2) to proceed and see if it gets all the pages > added by worker-1 from FSM. You need to keep the leader also waiting > or not perform any operation. >
While looking at results, I have observed one more thing that we are trying to parallelize I/O due to which we might not be seeing benefit in such cases. I think even for non-write queries there won't be any (much) benefit if we can't parallelize CPU usage. Basically, the test you are doing is for statement: explain analyze verbose create table test as select * from tenk1;. Now, in this statement, there is no qualification and still, the Gather node is generated for it, this won't be the case if we check "select * from tenk1". Is it due to the reason that the patch completely ignores the parallel_tuple_cost? But still, it should prefer a serial plan due parallel_setup_cost, why is that not happening? Anyway, I think we should not parallelize such queries where we can't parallelize CPU usage. Have you tried the cases without changing any of the costings for parallelism? -- With Regards, Amit Kapila.