Thanks for doing all this. (Do I understand correctly that this patch is not in the commitfest?)
I think my mental model for this was that "allnulls" meant that either there are no values for the column in question or that the values were all nulls (For minmax without NULL handling, which is where this all started, these two things are essentially the same: the range is not to be returned. So this became a bug the instant I added handling for NULL values.) I failed to realize that these were two different things, and this is likely the origin of all these troubles. What do you think of using the unused bit in BrinTuple->bt_info to denote a range that contains no heap tuples? This also means we need it in BrinMemTuple, I think we can do this: @@ -44,6 +44,7 @@ typedef struct BrinValues typedef struct BrinMemTuple { bool bt_placeholder; /* this is a placeholder tuple */ + bool bt_empty_range; /* range has no tuples */ BlockNumber bt_blkno; /* heap blkno that the tuple is for */ MemoryContext bt_context; /* memcxt holding the bt_columns values */ /* output arrays for brin_deform_tuple: */ @@ -69,7 +70,7 @@ typedef struct BrinTuple * * 7th (high) bit: has nulls * 6th bit: is placeholder tuple - * 5th bit: unused + * 5th bit: range has no tuples * 4-0 bit: offset of data * --------------- */ @@ -82,7 +83,7 @@ typedef struct BrinTuple * bt_info manipulation macros */ #define BRIN_OFFSET_MASK 0x1F -/* bit 0x20 is not used at present */ +#define BRIN_EMPTY_RANGE 0x20 #define BRIN_PLACEHOLDER_MASK 0x40 #define BRIN_NULLS_MASK 0x80 (Note that bt_empty_range uses a hole in the struct, so there's no ABI change.) This is BRIN-tuple-level, not column-level, so conceptually it seems more appropriate. (In the case where both are empty in union_tuples, we can return without entering the per-attribute loop at all, though I admit it's not a very interesting case.) This approach avoids having to invent the strange combination of all+has to mean empty. On 2023-Feb-24, Tomas Vondra wrote: > I wonder what's the best > way to test this in an automated way - it's very dependent on timing of > the concurrent updated. For example we need to do something like this: > > T1: run pg_summarize_range() until it inserts the placeholder tuple > T2: do an insert into the page range (updates placeholder) > T1: continue pg_summarize_range() to merge into the placeholder > > But there are no convenient ways to do this, I think. I had to check the > various cases using breakpoints in gdb etc. Yeah, I struggled with this during initial development but in the end did nothing. I think we would need to introduce some new framework, perhaps Korotkov stop-events stuff at https://postgr.es/m/CAPpHfdsTeb+hBT5=qxghjng_chcjldanq9sdy9vnwbf2e2p...@mail.gmail.com which seemed to me a good fit -- we would add a stop point after the placeholder tuple is inserted. > I'm not very happy with the union_tuples() changes - it's quite verbose, > perhaps a bit too verbose. We have to check for empty ranges first, and > then various combinations of allnulls/hasnulls flags for both BRIN > tuples. There are 9 combinations, and the current code just checks them > one by one - I was getting repeatedly confused by the original code, but > maybe it's too much. I think it's okay. I tried to make it more compact (by saying "these two combinations here are case 2, and these two other are case 4", and keeping each of the other combinations a separate case; so there are really 7 cases). But that doesn't make it any easier to follow, on the contrary it was more convoluted. I think a dozen extra lines of source is not a problem. > The alternative is to apply the same fix to every BRIN_PROCNUM_UNION > opclass procedure out there. I guess doing that for minmax+inclusion is > not a huge deal, but what about external opclasses? And without the fix > the indexes are effectively broken. Fixing this outside in brin.c (in > the union procedure) fixes this for every opclass procedure, without any > actual limitation of functinality (14+ does that anyway). About the hypothetical question, you could as well ask what about unicorns. I have never seen any hint that any external opclass exist. I am all for maintaining compatibility, but I think this concern is overblown for BRIN. Anyway, I think your proposed fix is better than changing individual 'union' support procs, so it doesn't matter. As far as I understood, you're now worried that there will be an incompatibility because we will fail to call the 'union' procedure in cases where we previously called it? In other words, you fear that some hypothetical opclass was handling the NULL values in some way that's incompatible with this? I haven't thought terribly hard about this, but I can't see a way for this to cause incompatibilities. > But maybe someone thinks this is a bad idea and we should do something > else in the backbranches? I think the new handling of NULLs in commit 72ccf55cb99c ("Move IS [NOT] NULL handling from BRIN support functions") is better than what was there before, so I don't object to backpatching it now that we know it's necessary to fix a bug, and also we have field experience that the approach is solid. The attached patch is just a pointer to comments that I think need light edition. There's also a typo "bot" (for "both") in a comment that I think would go away if you accept my suggestion to store 'empty' at the tuple level. Note that I worked with the REL_14_STABLE sources, because for some reason I thought that that was the newest that needed backpatching of 72ccf55cb99c, but now that I'm finishing this email I realize that I should have used 13 instead /facepalm -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La persona que no quería pecar / estaba obligada a sentarse en duras y empinadas sillas / desprovistas, por cierto de blandos atenuantes" (Patricio Vogel)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 95ed4ef362..0dddc6fa9c 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -594,9 +594,9 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm) bval = &dtup->bt_columns[attno - 1]; /* - * If the range has both allnulls and hasnulls set, it means - * there are no rows in the range, so we can skip it (we know - * there's nothing to match). + * If the BRIN tuple indicates that this range is empty, + * we can skip it: there's nothing to match. We don't + * need to examine the next columns. */ if (BRIN_RANGE_IS_EMPTY(bval)) { diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c index 7355e330f9..b3ba5ac365 100644 --- a/src/backend/access/brin/brin_tuple.c +++ b/src/backend/access/brin/brin_tuple.c @@ -607,8 +607,8 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, BrinMemTuple *dMemtuple) /* * Make sure to overwrite the hasnulls flag, because it was initialized - * to true by brin_memtuple_initialize and we don't want to skip it if - * allnulls=true. + * to true by brin_memtuple_initialize and we don't want to skip [it] if + * allnulls=true. (XXX "it" what?) */ dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];