On Wed, Jan 23, 2019 at 10:59 AM Peter Geoghegan <p...@bowt.ie> wrote: > > The fix here must be to normalize index tuples that are compressed > > within amcheck, both during initial fingerprinting, and during > > subsequent probes of the Bloom filter in bt_tuple_present_callback(). > > I happened to talk to Andres about this in person yesterday. He > thought that there was reason to be concerned about the need for > logical normalization beyond TOAST issues. Expression indexes were a > particular concern, because they could in principle have a change in > the on-disk representation without a change of logical values -- false > positives could result. He suggested that the long term solution was > to bring hash operator class hash functions into Bloom filter hashing, > at least where available.
I think that the best way forward is to normalize to compensate for inconsistent input datum TOAST state, and leave it at that. ISTM that logical normalization beyond that (based on hashing, or anything else) creates more problems than it solves. I am concerned about cases like INCLUDE indexes (which may have datums that lack even a B-Tree opclass), and about the logical-though-semantically-relevant facets of some datatypes such as numeric's display scale. If I can get an example from Andres of a case where further logical normalization is necessary to avoid false positives with expression indexes, that may change things. (BTW, I implemented another amcheck enhancement that searches indexes from the root to find matches -- the code is a trivial addition to the new patch series I'm working on, and seems like a better way to do enhanced logical normalization if that proves to be truly necessary.) Attached draft patch fixes the bug by doing fairly simple normalization. I think that TOAST compression of datums in indexes is fairly rare in practice, so I'm not very worried about the fact that this won't perform as well as it could with indexes that have a lot of compressed datums. I think that the interface I've added might need to be expanded for other things in the future (e.g., to make amcheck work with nbtree-native duplicate compression), and not worrying about the performance too much helps with that goal. I'll pick this up next week, and likely commit a fix by Wednesday or Thursday if there are no objections. I'm not sure if the test case is worth including. -- Peter Geoghegan
0001-Avoid-amcheck-TOAST-compression-inconsistencies.patch
Description: Binary data