On Wed, Dec 3, 2025 at 7:32 AM Chao Li <[email protected]> wrote: > The old code only sets so->numKilled to 0 and reuse memory of > so->killedItems[], now the new code always bms_free(so->killedItems) and > re-alloc memory when next time adding a member to bms. > > I am think that, if there were bms_clear(), then we could have just cleared > the bitmap and reused the memory next time. But unfortunately, there is no > such a bms_clear() now. What do you think to add bms_clear() and use it here? > I don’t want to do that, I can try that once you push this patch.
I don't think that it's worth the complexity. We can rely on palloc() to recycle the memory that was freed by the most recent bms_clear(). I say this in part because the complexity of reusing the same Bitmapset would be rather high. The idea that the only valid representation of an empty set is a NULL pointer is baked into the Bitmapset API. Note also that we'll use much less memory for killedItems by representing it as a Bitmapset. We'll use at most one bit per so->currPos.items[] item, whereas before we used 4 bytes per item. > I wonder if the comment “must be at lease one” should apply to the assignment > of tupleOffset? The “for” loop starts from nitems-2, will it still must be at > lease one item? By definition, a posting list tuple has at least 2 TIDs -- that's a posting list invariant. If there was only 1 TID, then it wouldn't be a posting list in the first place. (Unlike within GIN, where single TID posting lists are possible.) > /* > - * Don't bother advancing the outermost > loop's int iterator to > - * avoid processing killed items that relate > to the same > - * offnum/posting list tuple. This > micro-optimization hardly > - * seems worth it. (Further iterations of > the outermost loop > - * will fail to match on this same posting > list's first heap > - * TID instead, so we'll advance to the next > offnum/index > - * tuple pretty quickly.) > + * Don't advance itemIndex for outermost > loop, no matter how > + * nextIndex was advanced. It's possible > that items whose > + * TIDs weren't matched in posting list can > still be killed > + * (there might be a later tuple whose TID is > a match). > */ > if (j == nposting) > killtuple = true; > ``` > > I really don’t get what "Don't bother advancing the outermost loop's int > iterator” means? Here we only set killtuple to true, then if (killtuple && > !ItemIdIsDead(iid)), it breaks the inner while loop, in that case, outer > while loop "while ((itemIndex = bms_next_member(so->killedItems, itemIndex)) > >= 0)” will advance itemIndex. The old comment should probably have been written more like the new one (that I propose to replace it with). It's saying "don't try to be clever by remembering that we already determined that all the TIDs that we tried to compare to this posting list aren't matches for *any* TID". But I don't think that that's accurate; we *haven't* determined that those TIDs aren't matches for *any and all* TIDs on the page (only for those now in the posting list specifically). We might still be able to match those TIDs to later tuples, immediately after the posting list. Note that this is all a bit academic and theoretical; in practice it rarely comes up. During so->dropPin scans (which is the common case), we'll give up/get scared at the start of _bt_killitems if the page has changed at all since it was initially examined within _bt_readpage anyway -- no matter how the page was modified. It doesn't matter that the page probably *won't* have been modified by VACUUM when _bt_kilitems gets scared of modifying the page like this (VACUUM is the only thing that truly makes it unsafe for _bt_killitems to run, but _bt_killitems is conservative/easily scared). So while it is true that "We might still be able to match those TIDs to later tuples, immediately after the posting list" (as I said in the paragraph before the previous paragraph), we can only do so during !so->dropPin scans. In other words, only during index-only scans, or scans of an unlogged relation, or when we don't use an MVCC snapshot. All of which are rare. Which makes all this fairly academic/theoretical (mostly it's historical, 10+ years ago *all* scans were !so->dropPin scans). -- Peter Geoghegan
