Hello Tomas, On Mon, May 26, 2025 at 1:27 PM Tomas Vondra <to...@vondra.me> wrote:
> > Hello Arseniy, > > I finally got time to look at this more closely, and do some testing. Thank you for looking into this. > Are there any cases when the current code incorrectly reports corruption > for a valid index? So far I've been unable to find such case. Or am I wrong? I think you are right, I'm not aware of such cases either. > It seems to me all the proposed changes are "tightening" the checks, in > the sense that we might have missed certain types of issues before. This > is supported by the fact that the new TAP test fails on master, i.e. > master does not report the corruption the TAP introduces. I would say points 4, 5, 7 - yes, they are about tightening checks. I think point 1 is more about fixing the existing code. In the current code, parent_key is always NULL for the entry tree, so a bunch of code (related to checking consistency between parents and children) is unreachable. Then if you apply changes of the 1st point and parent_key comparison code starts working, you will need changes of the 2nd point. The current code ignores attribute numbers in parent_key check, which can lead to comparing keys of different columns. I see one scenario where it can happen: let's say we have a 2 column index. The first attribute type is "int", the second attribute type is "text". In the multicolumn gin index tuple has two parts: attno and key value. Let's write it as (attno, key). While traversing the entry tree the current code caches parent keys with child blkno. Let's say it cached (2, "a") parent key. It means that there was a time when the child page's high key was (2, "a"). But when the child page check actually starts, it's possible that as a result of parallel splits, the child page now contains keys of the first attribute only, for example (1, 1), (1, 5), (1, 10). So if we ignore the attribute number here, we will end up comparing 10 with "a". Hope the example is not too confusing. The 3rd point is about the code that never runs. As I understood it is supposed that the check detects splits so we can check more index pages, but If I'm not wrong it doesn't work now. The 6th point is about comparison with invalid pointer. I thought that it's probably not right to compare it with invalid pointer, but now I'm not sure. > (The TAP test is great, it would have been great to add something like > this in the original commit.) Great, thank you for the feedback. > Also, I've noticed that the TAP test passes even with some (most) of the > verify_gin.c changes reverted. See the 0002 patch - this does not break > the TAP test. Of course, that does not prove the changes are wrong and > I'm not claiming that. But can we improve the TAP test to trigger this > too? To show the current code (in master) misses this? Yes, changes in the undo patch is about posting tree check part (6, 7 points) and I haven't written tests for it, because to break posting tree you need to manipulate with tids which is not as easy as replace "aaaa" with "cccc" as tests for entry tree do. Probably it would be much easier to use page api to corrupt some posting tree pages, but I don't know, is it impossible in TAP tests?