On Tue, Dec 24, 2019 at 4:29 AM Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: > I updated the patchset. > The first patch now contains only infrastructure changes > and the second one sets opcisbitwise for btree opclasses in pg_opclass.dat.
We should try to formally define what we're trying to represent about B-Tree opclasses here -- the definition of "opcisbitwise"/preciseness/whatever should be tightened up. In particular, it should be clear how the "image" binary row comparators [1] (i.e. "operator *= equality" stuff) fit in. This new concept should be defined in terms of that existing concept -- we're talking about exactly the same variety of "internal binary equality" here, I think. I propose that we adopt the following definition: For an operator class to be safe, its equality operator has to always agree with datum_image_eq() (i.e. two datums must be bitwise equal after detoasting). (Maybe we should say something about "operator *= equality" as well (or instead), since that is already documented in [1].) We may also want to say something about foreign keys in this formal definition of "opcisbitwise"/preciseness. Discussion around the bug fixed by commit 1ffa59a85cb [1] showed that there was plenty of confusion in this area. Commit 1ffa59a85cb simply solved the problem that existed with foreign keys, without "joining the dots". It reused the rowtypes.c "operator *= equality" stuff to fix the problem, but only in an ad-hoc and undoumented way. Let's not do that again now. Note: In theory this definition is stricter than truly necessary to make deduplication safe, because we can imagine a contrived case in which an operator class exists where datum_image_eq() does not always agree with the equality operator, even though the equality operator will reliably consider two datums to be equal only when they have identical outputs from the underlying type's output function. This could happen when an operator class author wasn't very careful about zeroing padding -- this may not have mattered to the opclass author because nobody relied on that padding anyway. I think that stuff like this is not worth worrying about -- it can only happen because the datatype/operator class author was very sloppy. Note also: We seem to make this assumption already. Maybe this uninitialized bytes side issue doesn't even need to be pointed out or discussed. The comment just above VALGRIND_CHECK_MEM_IS_DEFINED() within PageAddItemExtended() seems to suggest this. The comment specifically mentions datumIsEqual() (not datum_image_eq()), but it's exactly the same issue. [1] https://www.postgresql.org/docs/devel/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON [2] https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a%402ndquadrant.com -- Peter Geoghegan