On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > > > > I analyzed performance of parallel inserts in CTAS for different cases > > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could > > gain if the tuple sizes are lower. But if the tuple size is larger > > i..e 1064bytes, there's a regression with parallel inserts. Upon > > further analysis, it turned out that the parallel workers are > > requiring frequent extra blocks addition while concurrently extending > > the relation(in RelationAddExtraBlocks) and the majority of the time > > spent is going into flushing those new empty pages/blocks onto the > > disk. > > > > How you have ensured that the cost is due to the flushing of pages?
I think I'm wrong to just say the problem is with the flushing of empty pages when bulk extending the relation. I should have said the problem is with the "relation extension lock", but I will hold on to it for a moment until I capture the relation extension lock wait events for the regression causing cases. I will share the information soon. > AFAICS, we don't flush the pages rather just write them and then > register those to be flushed by checkpointer, now it is possible that > the checkpointer sync queue gets full and the backend has to write by > itself but have we checked that? I think we can check via wait events, > if it is due to flush then we should see a lot of file sync > (WAIT_EVENT_DATA_FILE_SYNC) wait events. I will also capture the data file sync events along with relation extension lock wait events. > The other possibility could > be that the free pages added to FSM by one worker are not being used > by another worker due to some reason. Can we debug and check if the > pages added by one worker are being used by another worker? I tried to explain it at [1]. Please have a look. It looks like the burden is more on the "relation extension lock" and the way the extra new blocks are getting added. [1] https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com