Hi! On Wed, Jul 8, 2015 at 10:27 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
> In dataPlaceToPageLeaf-function: > > if (append) >> { >> /* >> * Even when appending, trying to append more items than >> will fit is >> * not completely free, because we will merge the new >> items and old >> * items into an array below. In the best case, every new >> item fits in >> * a single byte, and we can use all the free space on >> the old page as >> * well as the new page. For simplicity, ignore segment >> overhead etc. >> */ >> maxitems = Min(maxitems, freespace + >> GinDataPageMaxDataSize); >> } >> > > Hmm. So after splitting the page, there is freespace + > GinDataPageMaxDataSize bytes available on both pages together. freespace > has been adjusted with the fillfactor, but GinDataPageMaxDataSize is not. > So this overshoots, because when leafRepackItems actually distributes the > segments on the pages, it will fill both pages only up to the fillfactor. > This is an upper bound, so that's harmless, it only leads to some > unnecessary work in dealing with the item lists. But I think that should be: > > maxitems = Min(maxitems, freespace + leaf->maxdatasize); > Good catch! Fixed. > > else >> { >> /* >> * Calculate a conservative estimate of how many new >> items we can fit >> * on the two pages after splitting. >> * >> * We can use any remaining free space on the old page to >> store full >> * segments, as well as the new page. Each full-sized >> segment can hold >> * at least MinTuplesPerSegment items >> */ >> int nnewsegments; >> >> nnewsegments = freespace / GinPostingListSegmentMaxSize; >> nnewsegments += GinDataPageMaxDataSize / >> GinPostingListSegmentMaxSize; >> maxitems = Min(maxitems, nnewsegments * >> MinTuplesPerSegment); >> } >> > > This branch makes the same mistake, but this is calculating a lower bound. > It's important that maxitems is not set to higher value than what actually > fits on the page, otherwise you can get an ERROR later in the function, > when it turns out that not all the items actually fit on the page. The > saving grace here is that this branch is never taken when building a new > index, because index build inserts all the TIDs in order, but that seems > pretty fragile. Should use leaf->maxdatasize instead of > GinDataPageMaxDataSize here too. > Fixed. > But that can lead to funny things, if fillfactor is small, and BLCKSZ is > small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above > formula will set nnewsegments to zero. That's not going to end up well. The > problem is that maxdatasize becomes smaller than > GinPostingListSegmentMaxSize, which is a problem. I think > GinGetMaxDataSize() needs to make sure that the returned value is always >= > GinPostingListSegmentMaxSize. > Fixed. > Now that we have a fillfactor, shouldn't we replace the 75% heuristic > later in that function, when inserting to the rightmost page rather than > building it from scratch? In B-tree, the fillfactor is applied to that case > too. Sounds reasonable. Now it works like btree. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
gin_fillfactor_6.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers