Re: Optimising compactify_tuples()

David Rowley Tue, 08 Sep 2020 09:31:13 -0700

On Mon, 7 Sep 2020 at 19:47, David Rowley <dgrowle...@gmail.com> wrote:
> I wondered if we could get around that just by having another buffer
> somewhere and memcpy the tuples into that first then copy the tuples
> out that buffer back into the page. No need to worry about the order
> we do that in as there's no chance to overwrite memory belonging to
> other tuples.
>
> Doing that gives me 79.166 seconds in the same recovery test. Or about
> 46% faster, instead of 22% (I mistakenly wrote 28% yesterday)

I did some more thinking about this and considered if there's a way to
just get rid of the sorting version of compactify_tuples() completely.
In the version from yesterday, I fell back on the sort version for
when more than half the tuples from the page were being pruned. I'd
thought that in this case copying out *all* of the page from pd_upper
up to the pd_special (the tuple portion of the page) would maybe be
more costly since that would include (needlessly) copying all the
pruned tuples too. The sort also becomes cheaper in that case since
the number of items to sort is less, hence I thought it was a good
idea to keep the old version for some cases. However, I now think we
can just fix this by conditionally copying all tuples when in 1 big
memcpy when not many tuples have been pruned and when more tuples are
pruned we can just do individual memcpys into the separate buffer.

I wrote a little .c program to try to figure out of there's some good
cut off point to where one method becomes better than the other and I
find that generally if we're pruning away about 75% of tuples then
doing a memcpy() per non-pruned tuple is faster, otherwise, it seems
better just to copy the entire tuple area of the page. See attached
compact_test.c

I ran this and charted the cut off at (nitems < totaltups / 4) and
(nitems < totaltups / 2), and nitems < 16)
./compact_test 32 192
./compact_test 64 96
./compact_test 128 48
./compact_test 256 24
./compact_test 512 12

The / 4 one gives me the graphs with the smallest step when the method
switches. See attached 48_tuples_per_page.png for comparison.

I've so far come up with the attached
compactify_tuples_dgr_v2.patch.txt. Thomas pointed out to me off-list
that using PGAlignedBlock is the general way to allocate a page-sized
data on the stack. I'm still classing this patch as PoC grade. I'll
need to look a bit harder at the correctness of it all.

I did spend quite a bit of time trying to find a case where this is
slower than master's version. I can't find a case where there's any
noticeable slowdown. Using the same script from [1] I tried a few
variations of the t1 table by adding an additional column to pad out
the tuple to make it wider. Obviously a wider tuple means fewer
tuples on the page so less tuples for master's qsort to sort during
compactify_tuples(). I did manage to squeeze a bit more performance
out of the test cases. Yesterday I got 79.166 seconds. This version
gets me 76.623 seconds.

Here are the results of the various tuple widths:

narrow width row test: insert into t1 select x,0 from
generate_series(1,10000000) x; (32 byte tuples)

patched: 76.623
master: 137.593

medium width row test: insert into t1 select x,0,md5(x::text) ||
md5((x+1)::Text) from generate_series(1,10000000) x; (about 64 byte
tuples)

patched: 64.411
master: 95.576

wide row test: insert into t1 select x,0,(select
string_agg(md5(y::text),'') from generate_Series(x,x+30) y) from
generate_series(1,1000000)x; (1024 byte tuples)

patched: 88.653
master: 90.077

Changing the test so instead of having 10 million rows in the table
and updating a random row 12 million times. I put just 10 rows in the
table and updated them 12 million times. This results in
compactify_tuples() pruning all but 1 row (since autovac can't keep up
with this, each row does end up on a page by itself). I wanted to
ensure I didn't regress a workload that master's qsort() version would
have done very well at. qsorting 1 element is pretty fast.

10-row narrow test:

patched: 10.633 <--- small regression
master: 10.366

I could special case this and do a memmove without copying the tuple
to another buffer, but I don't think the slowdown is enough to warrant
having such a special case.

Another thing I tried was to instead of compacting the page in
compactify_tuples(), I just get rid of that function and did the
compacting in the existing loop in PageRepairFragmentation(). This
does require changing the ERROR check to a PANIC since we may have
already started shuffling tuples around when we find the corrupted
line pointer. However, I was just unable to make this faster than the
attached version. I'm still surprised at this as I can completely get
rid of the itemidbase array. The best run-time I got with this method
out the original test was 86 seconds, so 10 seconds slower than what
the attached can do. So I threw that idea away.

David

[1]
https://www.postgresql.org/message-id/caaphdvokwqazhiuxet8jsqupjkdph8dnuzdfusx9p7dxrjd...@mail.gmail.com

#define BLCKSZ 8192
#define MaxHeapTuplesPerPage 291

typedef unsigned long long int64;
typedef unsigned short uint16;
typedef short int16;

typedef union PGAlignedBlock
{
        char            data[BLCKSZ];
        double          force_align_d;
        int64           force_align_i64;
} PGAlignedBlock;

typedef struct PageHeader
{
        int pd_lower;
        int pd_upper;
        int pd_special;
} PageHeader;

typedef struct itemIdCompactData
{
        uint16          offsetindex;    /* linp array index */
        int16           itemoff;                /* page offset of item data */
        uint16          alignedlen;             /* MAXALIGN(item data len) */
} itemIdCompactData;
typedef itemIdCompactData *itemIdCompact;

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

static int
itemoffcompare(const void *itemidp1, const void *itemidp2)
{
        /* Sort in decreasing itemoff order */
        return rand() > RAND_MAX / 2 ? 1 : -1;
}

void
make_itemIdCompactData_array(itemIdCompactData *array, int size, int tupwidth)
{
        int i;
        int pd_upper = BLCKSZ;
        
        if (size * tupwidth > BLCKSZ)
        {
                fprintf(stderr, "tuple data bigger than BLCKSZ\n");
                exit(-1);
        }
        
        for (i = 0; i < size; i++)
        {
                pd_upper -= tupwidth;
                array[i].offsetindex = i + 1;
                array[i].itemoff = pd_upper;
                array[i].alignedlen = tupwidth;
        }
        
        qsort((char *) array, size, sizeof(itemIdCompactData), itemoffcompare);
}

void
print_itemIdCompactData_array(itemIdCompactData *array, int size)
{
        int i;
        
        for (i = 0; i < size; i++)
        {
                itemIdCompact   itemidptr = &array[i];
                printf("item[%d] offset: %d, len: %u\n", 
itemidptr->offsetindex, itemidptr->itemoff, itemidptr->alignedlen);
        }
}

static void
compactify_tuples(itemIdCompact itemidbase, int nitems, int totaltups, 
PGAlignedBlock *page, PageHeader *phdr)
{
        int             upper;
        PGAlignedBlock scratch;
        char       *scratchptr = scratch.data;
        int                     i;

        /*
         * The tuples in the itemidbase array may be in any order so in order to
         * move these to the end of the page we must make a temp copy of each
         * tuple before we copy them back into the page at the new position.
         *
         * If a large number of the tuples have been pruned (>75%) then we'll 
copy
         * these into the temp buffer tuple-by-tuple, otherwise we'll just copy
         * the entire tuple section of the page in a single memcpy().  Doing 
this
         * saves us doing large copies when we're pruning most of the tuples.
         */
        if (nitems < totaltups / 2)
        {
                for (i = 0; i < nitems; i++)
                {
                        itemIdCompact   itemidptr = &itemidbase[i];
                        memcpy(scratchptr + itemidptr->itemoff, page->data + 
itemidptr->itemoff,
                                   itemidptr->alignedlen);
                }
        }
        else
        {
                /* Copy the entire tuple space */
                memcpy(scratchptr + phdr->pd_upper,
                           page->data + phdr->pd_upper,
                           phdr->pd_special - phdr->pd_upper);
        }

        upper = BLCKSZ;
        for (i = 0; i < nitems; i++)
        {
                itemIdCompact   itemidptr = &itemidbase[i];

                upper -= itemidptr->alignedlen;
                memcpy((char *) page + upper,
                           scratchptr + itemidptr->itemoff,
                           itemidptr->alignedlen);
        }
}



int main(int argc, char **argv)
{
        int tupwidth;
        int ntuples;
        int unprunedtups;
        PGAlignedBlock page;
        PageHeader header;
        clock_t start, end;
        int i;
        int t;
        
        if (argc < 3)
        {
                printf("Usage: %s <tupwidth> <ntuples>\n", argv[0]);
                return -1;
        }
        
        tupwidth = atoi(argv[1]);
        ntuples = atoi(argv[2]);
        
        if (ntuples > MaxHeapTuplesPerPage)
        {
                printf("ntuples cannot be above %d\n", MaxHeapTuplesPerPage);
                return -1;
        }
        
        header.pd_lower = 0;
        header.pd_upper = BLCKSZ - tupwidth * ntuples;
        header.pd_special = BLCKSZ;
        
        itemIdCompactData itemidbase[MaxHeapTuplesPerPage];

        for (unprunedtups = 1; unprunedtups < ntuples; unprunedtups++)
        {

                make_itemIdCompactData_array(itemidbase, unprunedtups, 
tupwidth);
        
                start = clock();
                for (i = 0; i < 1000000; i++)
                        compactify_tuples(itemidbase, unprunedtups, ntuples, 
&page, &header);
                end = clock();
                
                printf("%d : %g msec\n", unprunedtups, (double) (end - start) / 
CLOCKS_PER_SEC * 1000);
        }
        return 0;
}

diff --git a/src/backend/storage/page/bufpage.c 
b/src/backend/storage/page/bufpage.c
index d708117a40..b20559230c 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -411,50 +411,67 @@ PageRestoreTempPage(Page tempPage, Page oldPage)
 }
 
 /*
- * sorting support for PageRepairFragmentation and PageIndexMultiDelete
+ * Item pruning support for PageRepairFragmentation and PageIndexMultiDelete
  */
-typedef struct itemIdSortData
+typedef struct itemIdCompactData
 {
        uint16          offsetindex;    /* linp array index */
        int16           itemoff;                /* page offset of item data */
        uint16          alignedlen;             /* MAXALIGN(item data len) */
-} itemIdSortData;
-typedef itemIdSortData *itemIdSort;
-
-static int
-itemoffcompare(const void *itemidp1, const void *itemidp2)
-{
-       /* Sort in decreasing itemoff order */
-       return ((itemIdSort) itemidp2)->itemoff -
-               ((itemIdSort) itemidp1)->itemoff;
-}
+} itemIdCompactData;
+typedef itemIdCompactData *itemIdCompact;
 
 /*
  * After removing or marking some line pointers unused, move the tuples to
  * remove the gaps caused by the removed items.
  */
 static void
-compactify_tuples(itemIdSort itemidbase, int nitems, Page page)
+compactify_tuples(itemIdCompact itemidbase, int nitems, Page page)
 {
        PageHeader      phdr = (PageHeader) page;
        Offset          upper;
+       PGAlignedBlock scratch;
+       char       *scratchptr = scratch.data;
        int                     i;
 
-       /* sort itemIdSortData array into decreasing itemoff order */
-       qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
-                 itemoffcompare);
+       /*
+        * The tuples in the itemidbase array may be in any order so in order to
+        * move these to the end of the page we must make a temp copy of each
+        * tuple before we copy them back into the page at the new position.
+        *
+        * If a large number of the tuples have been pruned (>75%) then we'll 
copy
+        * these into the temp buffer tuple-by-tuple, otherwise we'll just copy
+        * the entire tuple section of the page in a single memcpy().  Doing 
this
+        * saves us doing large copies when we're pruning most of the tuples.
+        */
+       if (nitems < PageGetMaxOffsetNumber(page) / 4)
+       {
+               for (i = 0; i < nitems; i++)
+               {
+                       itemIdCompact   itemidptr = &itemidbase[i];
+                       memcpy(scratchptr + itemidptr->itemoff, page + 
itemidptr->itemoff,
+                                  itemidptr->alignedlen);
+               }
+       }
+       else
+       {
+               /* Copy the entire tuple space */
+               memcpy(scratchptr + phdr->pd_upper,
+                          page + phdr->pd_upper,
+                          phdr->pd_special - phdr->pd_upper);
+       }
 
        upper = phdr->pd_special;
        for (i = 0; i < nitems; i++)
        {
-               itemIdSort      itemidptr = &itemidbase[i];
+               itemIdCompact   itemidptr = &itemidbase[i];
                ItemId          lp;
 
                lp = PageGetItemId(page, itemidptr->offsetindex + 1);
                upper -= itemidptr->alignedlen;
-               memmove((char *) page + upper,
-                               (char *) page + itemidptr->itemoff,
-                               itemidptr->alignedlen);
+               memcpy((char *) page + upper,
+                          scratchptr + itemidptr->itemoff,
+                          itemidptr->alignedlen);
                lp->lp_off = upper;
        }
 
@@ -477,8 +494,8 @@ PageRepairFragmentation(Page page)
        Offset          pd_lower = ((PageHeader) page)->pd_lower;
        Offset          pd_upper = ((PageHeader) page)->pd_upper;
        Offset          pd_special = ((PageHeader) page)->pd_special;
-       itemIdSortData itemidbase[MaxHeapTuplesPerPage];
-       itemIdSort      itemidptr;
+       itemIdCompactData itemidbase[MaxHeapTuplesPerPage];
+       itemIdCompact   itemidptr;
        ItemId          lp;
        int                     nline,
                                nstorage,
@@ -831,9 +848,9 @@ PageIndexMultiDelete(Page page, OffsetNumber *itemnos, int 
nitems)
        Offset          pd_lower = phdr->pd_lower;
        Offset          pd_upper = phdr->pd_upper;
        Offset          pd_special = phdr->pd_special;
-       itemIdSortData itemidbase[MaxIndexTuplesPerPage];
+       itemIdCompactData itemidbase[MaxIndexTuplesPerPage];
        ItemIdData      newitemids[MaxIndexTuplesPerPage];
-       itemIdSort      itemidptr;
+       itemIdCompact   itemidptr;
        ItemId          lp;
        int                     nline,
                                nused;

Re: Optimising compactify_tuples()

Reply via email to