On Wed, Dec 25, 2019 at 10:39:32AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 24 Dec 2019 11:57:39 -0800, Noah Misch <n...@leadboat.com> wrote in 
> > On Mon, Dec 23, 2019 at 07:41:49PM +0900, Kyotaro Horiguchi wrote:
> > > If we regard repalloc as far faster than FileOpen/FileClose or we care
> > > about only increase of segment number of mdopen'ed files and don't
> > > care the frequent resize that happens during the functions above, then
> > > the comment is right and we may resize the array in the
> > > segment-by-segment manner.
> > 
> > In most cases, the array will fit into a power-of-two chunk, so repalloc()
> > already does the right thing.  Once the table has more than ~1000 segments 
> > (~1
> > TiB table size), the allocation will get a single-chunk block, and every
> > subsequent repalloc() will call realloc().  Even then, repalloc() probably 
> > is
> > far faster than File operations.  Likely, I should just accept the extra
> > repalloc() calls and drop the "else if" change in _fdvec_resize().
> 
> I'm not sure which is better. If we say we know that
> repalloc(AllocSetRealloc) doesn't free memory at all, there's no point
> in calling repalloc for shrinking and we could omit that under the
> name of optimization.  If we say we want to free memory as much as
> possible, we should call repalloc pretending to believe that that
> happens.

As long as we free the memory by the end of mdclose(), I think it doesn't
matter whether we freed memory in the middle of mdclose().

I ran a crude benchmark that found PathNameOpenFile()+FileClose() costing at
least two hundred times as much as the repalloc() pair.  Hence, I now plan not
to avoid repalloc(), as attached.  Crude benchmark code:

        #define NSEG 9000
        for (i = 0; i < count1; i++)
        {
                int j;

                for (j = 0; j < NSEG; ++j)
                {
                        File f = PathNameOpenFile("/etc/services", O_RDONLY);
                        if (f < 0)
                                elog(ERROR, "fail open: %m");
                        FileClose(f);
                }
        }

        for (i = 0; i < count2; i++)
        {
                int j;
                void *buf = palloc(1);

                for (j = 2; j < NSEG; ++j)
                        buf = repalloc(buf, j * 8);
                while (--j > 0)
                        buf = repalloc(buf, j * 8);
        }
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 82442db..e768f2c 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -516,18 +516,10 @@ mdclose(SMgrRelation reln, ForkNumber forknum)
        {
                MdfdVec    *v = &reln->md_seg_fds[forknum][nopensegs - 1];
 
-               /* if not closed already */
-               if (v->mdfd_vfd >= 0)
-               {
-                       FileClose(v->mdfd_vfd);
-                       v->mdfd_vfd = -1;
-               }
-
+               FileClose(v->mdfd_vfd);
+               _fdvec_resize(reln, forknum, nopensegs - 1);
                nopensegs--;
        }
-
-       /* resize just once, avoids pointless reallocations */
-       _fdvec_resize(reln, forknum, 0);
 }
 
 /*
@@ -1050,10 +1042,10 @@ _fdvec_resize(SMgrRelation reln,
        else
        {
                /*
-                * It doesn't seem worthwhile complicating the code by having a 
more
-                * aggressive growth strategy here; the number of segments 
doesn't
-                * grow that fast, and the memory context internally will 
sometimes
-                * avoid doing an actual reallocation.
+                * It doesn't seem worthwhile complicating the code to amortize
+                * repalloc() calls.  Those are far faster than 
PathNameOpenFile() or
+                * FileClose(), and the memory context internally will 
sometimes avoid
+                * doing an actual reallocation.
                 */
                reln->md_seg_fds[forknum] =
                        repalloc(reln->md_seg_fds[forknum],

Reply via email to