On Wed, Dec 25, 2019 at 10:39:32AM +0900, Kyotaro Horiguchi wrote: > At Tue, 24 Dec 2019 11:57:39 -0800, Noah Misch <n...@leadboat.com> wrote in > > On Mon, Dec 23, 2019 at 07:41:49PM +0900, Kyotaro Horiguchi wrote: > > > If we regard repalloc as far faster than FileOpen/FileClose or we care > > > about only increase of segment number of mdopen'ed files and don't > > > care the frequent resize that happens during the functions above, then > > > the comment is right and we may resize the array in the > > > segment-by-segment manner. > > > > In most cases, the array will fit into a power-of-two chunk, so repalloc() > > already does the right thing. Once the table has more than ~1000 segments > > (~1 > > TiB table size), the allocation will get a single-chunk block, and every > > subsequent repalloc() will call realloc(). Even then, repalloc() probably > > is > > far faster than File operations. Likely, I should just accept the extra > > repalloc() calls and drop the "else if" change in _fdvec_resize(). > > I'm not sure which is better. If we say we know that > repalloc(AllocSetRealloc) doesn't free memory at all, there's no point > in calling repalloc for shrinking and we could omit that under the > name of optimization. If we say we want to free memory as much as > possible, we should call repalloc pretending to believe that that > happens.
As long as we free the memory by the end of mdclose(), I think it doesn't matter whether we freed memory in the middle of mdclose(). I ran a crude benchmark that found PathNameOpenFile()+FileClose() costing at least two hundred times as much as the repalloc() pair. Hence, I now plan not to avoid repalloc(), as attached. Crude benchmark code: #define NSEG 9000 for (i = 0; i < count1; i++) { int j; for (j = 0; j < NSEG; ++j) { File f = PathNameOpenFile("/etc/services", O_RDONLY); if (f < 0) elog(ERROR, "fail open: %m"); FileClose(f); } } for (i = 0; i < count2; i++) { int j; void *buf = palloc(1); for (j = 2; j < NSEG; ++j) buf = repalloc(buf, j * 8); while (--j > 0) buf = repalloc(buf, j * 8); }
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 82442db..e768f2c 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -516,18 +516,10 @@ mdclose(SMgrRelation reln, ForkNumber forknum) { MdfdVec *v = &reln->md_seg_fds[forknum][nopensegs - 1]; - /* if not closed already */ - if (v->mdfd_vfd >= 0) - { - FileClose(v->mdfd_vfd); - v->mdfd_vfd = -1; - } - + FileClose(v->mdfd_vfd); + _fdvec_resize(reln, forknum, nopensegs - 1); nopensegs--; } - - /* resize just once, avoids pointless reallocations */ - _fdvec_resize(reln, forknum, 0); } /* @@ -1050,10 +1042,10 @@ _fdvec_resize(SMgrRelation reln, else { /* - * It doesn't seem worthwhile complicating the code by having a more - * aggressive growth strategy here; the number of segments doesn't - * grow that fast, and the memory context internally will sometimes - * avoid doing an actual reallocation. + * It doesn't seem worthwhile complicating the code to amortize + * repalloc() calls. Those are far faster than PathNameOpenFile() or + * FileClose(), and the memory context internally will sometimes avoid + * doing an actual reallocation. */ reln->md_seg_fds[forknum] = repalloc(reln->md_seg_fds[forknum],