Re: [HACKERS] Fillfactor for GIN indexes

Alexander Korotkov Thu, 09 Jul 2015 06:36:01 -0700

Hi!

On Wed, Jul 8, 2015 at 10:27 PM, Heikki Linnakangas <[email protected]> wrote:


> In dataPlaceToPageLeaf-function:
>
>          if (append)
>>         {
>>                 /*
>>                  * Even when appending, trying to append more items than
>> will fit is
>>                  * not completely free, because we will merge the new
>> items and old
>>                  * items into an array below. In the best case, every new
>> item fits in
>>                  * a single byte, and we can use all the free space on
>> the old page as
>>                  * well as the new page. For simplicity, ignore segment
>> overhead etc.
>>                  */
>>                 maxitems = Min(maxitems, freespace +
>> GinDataPageMaxDataSize);
>>         }
>>
>
> Hmm. So after splitting the page, there is freespace +
> GinDataPageMaxDataSize bytes available on both pages together. freespace
> has been adjusted with the fillfactor, but GinDataPageMaxDataSize is not.
> So this overshoots, because when leafRepackItems actually distributes the
> segments on the pages, it will fill both pages only up to the fillfactor.
> This is an upper bound, so that's harmless, it only leads to some
> unnecessary work in dealing with the item lists. But I think that should be:
>
> maxitems = Min(maxitems, freespace + leaf->maxdatasize);
>

Good catch! Fixed.


>
>          else
>>         {
>>                 /*
>>                  * Calculate a conservative estimate of how many new
>> items we can fit
>>                  * on the two pages after splitting.
>>                  *
>>                  * We can use any remaining free space on the old page to
>> store full
>>                  * segments, as well as the new page. Each full-sized
>> segment can hold
>>                  * at least MinTuplesPerSegment items
>>                  */
>>                 int                     nnewsegments;
>>
>>                 nnewsegments = freespace / GinPostingListSegmentMaxSize;
>>                 nnewsegments += GinDataPageMaxDataSize /
>> GinPostingListSegmentMaxSize;
>>                 maxitems = Min(maxitems, nnewsegments *
>> MinTuplesPerSegment);
>>         }
>>
>
> This branch makes the same mistake, but this is calculating a lower bound.
> It's important that maxitems is not set to higher value than what actually
> fits on the page, otherwise you can get an ERROR later in the function,
> when it turns out that not all the items actually fit on the page. The
> saving grace here is that this branch is never taken when building a new
> index, because index build inserts all the TIDs in order, but that seems
> pretty fragile. Should use leaf->maxdatasize instead of
> GinDataPageMaxDataSize here too.
>

Fixed.


> But that can lead to funny things, if fillfactor is small, and BLCKSZ is
> small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above
> formula will set nnewsegments to zero. That's not going to end up well. The
> problem is that maxdatasize becomes smaller than
> GinPostingListSegmentMaxSize, which is a problem. I think
> GinGetMaxDataSize() needs to make sure that the returned value is always >=
> GinPostingListSegmentMaxSize.
>

Fixed.


> Now that we have a fillfactor, shouldn't we replace the 75% heuristic
> later in that function, when inserting to the rightmost page rather than
> building it from scratch? In B-tree, the fillfactor is applied to that case
> too.


Sounds reasonable. Now it works like btree.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

gin_fillfactor_6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

Reply via email to