On Thu, Jan 18, 2024 at 8:31 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > It seems we cannot make this work nicely. IIUC VacDeadItems is > allocated in DSM and TidStore is embedded there. However, > dead_items->items.area is a local pointer to dsa_area. So we cannot > include dsa_area in neither TidStore nor RT_RADIX_TREE. Instead we > would need to pass dsa_area to each interface by callers.
Thanks again for exploring this line of thinking! Okay, it seems even if there's a way to make this work, it would be too invasive to justify when compared with the advantage I was hoping for. > > If we can't make this work nicely, I'd be okay with keeping the tid > > store control object. My biggest concern is unnecessary > > double-locking. > > If we don't do any locking stuff in radix tree APIs and it's the > user's responsibility at all, probably we don't need a lock for > tidstore? That is, we expose lock functions as you mentioned and the > user (like tidstore) acquires/releases the lock before/after accessing > the radix tree and num_items. I'm not quite sure what the point of "num_items" is anymore, because it was really tied to the array in VacDeadItems. dead_items->num_items is essential to reading/writing the array correctly. If this number is wrong, the array is corrupt. There is no such requirement for the radix tree. We don't need to know the number of tids to add to it or do a lookup, or anything. There are a number of places where we assert "the running count of the dead items" is the same as "the length of the dead items array", like here: @@ -2214,7 +2205,7 @@ lazy_vacuum(LVRelState *vacrel) BlockNumber threshold; Assert(vacrel->num_index_scans == 0); - Assert(vacrel->lpdead_items == vacrel->dead_items->num_items); + Assert(vacrel->lpdead_items == TidStoreNumTids(vacrel->dead_items)); As such, in HEAD I'm guessing it's arbitrary which one is used for control flow. Correct me if I'm mistaken. If I am wrong for some part of the code, it'd be good to understand when that invariant can't be maintained. @@ -1258,7 +1265,7 @@ lazy_scan_heap(LVRelState *vacrel) * Do index vacuuming (call each index's ambulkdelete routine), then do * related heap vacuuming */ - if (dead_items->num_items > 0) + if (TidStoreNumTids(dead_items) > 0) lazy_vacuum(vacrel); Like here. In HEAD, could this have used vacrel->dead_items? @@ -2479,14 +2473,14 @@ lazy_vacuum_heap_rel(LVRelState *vacrel) * We set all LP_DEAD items from the first heap pass to LP_UNUSED during * the second heap pass. No more, no less. */ - Assert(index > 0); Assert(vacrel->num_index_scans > 1 || - (index == vacrel->lpdead_items && + (TidStoreNumTids(vacrel->dead_items) == vacrel->lpdead_items && vacuumed_pages == vacrel->lpdead_item_pages)); ereport(DEBUG2, - (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", - vacrel->relname, (long long) index, vacuumed_pages))); + (errmsg("table \"%s\": removed " INT64_FORMAT "dead item identifiers in %u pages", + vacrel->relname, TidStoreNumTids(vacrel->dead_items), + vacuumed_pages))); We assert that vacrel->lpdead_items has the expected value, and then the ereport repeats the function call (with a lock) to read the value we just consulted to pass the assert. If we *really* want to compare counts, maybe we could invent a debugging-only function that iterates over the tree and popcounts the bitmaps. That seems too expensive for regular assert builds, though. On the subject of debugging builds, I think it no longer makes sense to have the array for debug checking in tid store, even during development. A few months ago, we had an encoding scheme that looked simple on paper, but its code was fiendishly difficult to follow (at least for me). That's gone. In addition to the debugging count above, we could also put a copy of the key in the BlockTableEntry's header, in debug builds. We don't yet need to care about the key size, since we don't (yet) have runtime-embeddable values. > Currently (as of v52 patch) RT_FIND is > doing so, [meaning, there is no internal "automatic" locking here since after we switched to variable-length types, an outstanding TODO] Maybe it's okay to expose global locking for v17. I have one possible alternative: This week I tried an idea to use a callback there so that after internal unlocking, the caller received the value (or whatever else needs to happen, such as lookup an offset in the tid bitmap). I've attached a draft for that that passes radix tree tests. It's a bit awkward, but I'm guessing this would more closely match future internal atomic locking. Let me know what you think of the concept, and then do whichever way you think is best. (using v53 as the basis) I believe this is the only open question remaining. The rest is just polish and testing. > During trying this idea, I realized that there is a visibility problem > in the radix tree template If it's broken even without the embedding I'll look into this (I don't know if this configuration has ever been tested). I think a good test is putting the shared tid tree in it's own translation unit, to see if anything needs to be fixed. I'll go try that.
rt-find-callback-interface.patch.nocfbot
Description: Binary data