On Fri, Mar 31, 2017 at 4:31 PM, Peter Geoghegan <p...@bowt.ie> wrote: > That's all I have for now. Maybe I can look again later, or tomorrow.
I took another look, this time at code used during CREATE INDEX. More feedback: * I see no reason to expose _bt_pgaddtup() (to modify it to not be static, so it can be called during CREATE INDEX for truncated high key). You could call PageAddItem() directly, just as _bt_pgaddtup() itself does, and lose nothing. This is the case because the special steps within _bt_pgaddtup() are only when inserting the first real item (and only on an internal page). You're only ever using _bt_pgaddtup() for the high key offset. Would a raw PageAddItem() call lose anything? I think I see why you've done this -- the existing CREATE INDEX _bt_sortaddtup() routine (which is very similar to _bt_pgaddtup(), a routine used for *insertion*) doesn't do the correct thing were you to use it, because it assumes that the page is always right most (i.e., always has no high key yet). The reason _bt_sortaddtup() exists is explained here: * This is almost like nbtinsert.c's _bt_pgaddtup(), but we can't use * that because it assumes that P_RIGHTMOST() will return the correct * answer for the page. Here, we don't know yet if the page will be * rightmost. Offset P_FIRSTKEY is always the first data key. */ static void _bt_sortaddtup(Page page, Size itemsize, IndexTuple itup, OffsetNumber itup_off) { ... } (...thinks some more...) So, this difference only matters when you have a non-leaf item, which is never subject to truncation in your patch. So, in fact, it doesn't matter at all. I guess you should just use _bt_pgaddtup() after all, rather than bothering with a raw PageAddItem(), even. But, don't forget to note why this is okay above _bt_sortaddtup(). * Calling PageIndexTupleDelete() within _bt_buildadd(), which memmove()s all other items on the leaf page, seems wasteful in the context of CREATE INDEX. Can we do better? * I also think that calling PageIndexTupleDelete() has a page space accounting bug, because the following thing happens a second time for highkey ItemId when new code does this call: phdr->pd_lower -= sizeof(ItemIdData); (The first time this happens is within _bt_buildadd() itself, just before your patch calls PageIndexTupleDelete().) * I don't think it's okay to let index_truncate_tuple() leak memory within _bt_buildadd(). It's probably okay for nbtinsert.c callers to index_truncate_tuple() to not be too careful, though, since those calls occur in a per-tuple memory context. The same cannot be said for _bt_buildadd()/CREATE INDEX calls. * Speaking of memory management: is this really needed? > @@ -554,7 +580,11 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, > IndexTuple itup) > * Save a copy of the minimum key for the new page. We have to copy > * it off the old page, not the new one, in case we are not at leaf > * level. > + * Despite oitup is already initialized, it's important to get high > + * key from the page, since we could have replaced it with truncated > + * copy. See comment above. > */ > + oitup = (IndexTuple) PageGetItem(opage,PageGetItemId(opage, P_HIKEY)); > state->btps_minkey = CopyIndexTuple(oitup); You didn't modify/truncate oitup in-place -- you effectively made a (truncated) copy by calling index_truncate_tuple(). Maybe you can manage the memory by assigning keytup to state->btps_minkey, in place of a CopyIndexTuple(), just for the truncation case? I haven't studied this in enough detail to be sure that that would be correct, but it seems clear that a better strategy is needed for managing memory within _bt_buildadd(). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers