On Tuesday, December 22, 2020 9:11 PM, Amit Kapila wrote: > On Tue, Dec 22, 2020 at 2:55 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > Next, I'll look into DropRelFileNodesAllBuffers() > > optimization patch. > > > > Review of v35-0004-Optimize-DropRelFileNodesAllBuffers-in-recovery [1] > ================================================= > ======= > 1. > DropRelFileNodesAllBuffers() > { > .. > +buffer_full_scan: > + pfree(block); > + nodes = palloc(sizeof(RelFileNode) * n); /* non-local relations */ > +for (i = 0; i < n; i++) nodes[i] = smgr_reln[i]->smgr_rnode.node; > + > .. > } > > How is it correct to assign nodes array directly from smgr_reln? There is no > one-to-one correspondence. If you see the code before patch, the passed > array can have mixed of temp and non-temp relation information.
You are right. I mistakenly removed the array node that should have been allocated for non-local relations. So I fixed that by doing: SMgrRelation *rels; rels = palloc(sizeof(SMgrRelation) * nnodes); /* non-local relations */ /* If it's a local relation, it's localbuf.c's problem. */ for (i = 0; i < nnodes; i++) { if (RelFileNodeBackendIsTemp(smgr_reln[i]->smgr_rnode)) { if (smgr_reln[i]->smgr_rnode.backend == MyBackendId) DropRelFileNodeAllLocalBuffers(smgr_reln[i]->smgr_rnode.node); } else rels[n++] = smgr_reln[i]; } ... if (n == 0) { pfree(rels); return; } ... //traditional path: pfree(block); nodes = palloc(sizeof(RelFileNode) * n); /* non-local relations */ for (i = 0; i < n; i++) nodes[i] = rels[i]->smgr_rnode.node; > 2. > + for (i = 0; i < n; i++) > { > - pfree(nodes); > + for (j = 0; j <= MAX_FORKNUM; j++) > + { > + /* > + * Assign InvalidblockNumber to a block if a relation > + * fork does not exist, so that we can skip it later > + * when dropping the relation buffers. > + */ > + if (!smgrexists(smgr_reln[i], j)) > + { > + block[i][j] = InvalidBlockNumber; > + continue; > + } > + > + /* Get the number of blocks for a relation's fork */ block[i][j] = > + smgrnblocks(smgr_reln[i], j, &cached); > > Similar to above, how can we assume smgr_reln array has all non-local > relations? Have we tried the case with mix of temp and non-temp relations? Similar to above reply. > In this code, I am slightly worried about the additional cost of each time > checking smgrexists. Consider a case where there are many relations and only > one or few of them have not cached the information, in such a case we will > pay the cost of smgrexists for many relations without even going to the > optimized path. Can we avoid that in some way or at least reduce its usage to > only when it is required? One idea could be that we first check if the nblocks > information is cached and if so then we don't need to call smgrnblocks, > otherwise, check if it exists. For this, we need an API like > smgrnblocks_cahced, something we discussed earlier but preferred the > current API. Do you have any better ideas? Right. I understand the point that let's say there are 100 relations, and the first 99 non-local relations happen to enter the optimization path, but the last one does not, calling smgrexist() would be too costly and waste of time in that case. The only solution I could think of as you mentioned is to reintroduce the new API which we discussed before: smgrnblocks_cached(). It's possible that we call smgrexist() only if smgrnblocks_cached() returns InvalidBlockNumber. So if everyone agrees, we can add that API smgrnblocks_cached() which will Include the "cached" flag parameter, and remove the additional flag modifications from smgrnblocks(). In this case, both DropRelFileNodeBuffers() and DropRelFileNodesAllBuffers() will use the new API. Thoughts? Regards, Kirk Jamison