On Mon, 8 Feb 2021, 14:46 Pavel Borisov <pashkin.e...@gmail.com wrote: > Hi, hackers! > > It seems that if btree index with a unique constraint is corrupted by > duplicates, amcheck now can not catch this. Reindex becomes impossible as > it throws an error but otherwise the index will let the user know that it > is corrupted, and amcheck will tell that the index is clean. So I'd like to > propose a short patch to improve amcheck for checking the unique > constraint. It will output tid's of tuples that are duplicated in the index > (i.e. more than one tid for the same index key is visible) and the user can > easily investigate and delete corresponding table entries. > > 0001 - is the actual patch, and > 0002 - is a temporary hack for testing. It will allow inserting duplicates > in a table even if an index with the exact name "idx" has a unique > constraint (generally it is prohibited to insert). Then a new amcheck will > tell us about these duplicates. It's pity but testing can not be done > automatically, as it needs a core recompile. For testing I'd recommend a > protocol similar to the following: > > - Apply patch 0002 > - Set autovaccum = off in postgresql.conf > > > > *create table tbl2 (a varchar(50), b varchar(150), c bytea, d > varchar(50));create unique index idx on tbl2(a,b);insert into tbl2 select > i::text::varchar, i::text::varchar, i::text::bytea, i::text::varchar from > generate_series(0,3) as i, generate_series(0,10000) as x;* > > So we'll have a generous amount of duplicates in the table and index. Then: > > *create extension amcheck;* > *select bt_index_check('idx', true);* > > There will be output about each pair of duplicates: index tid's, position > in a posting list (if the index item is deduplicated) and table tid's. > > WARNING: index uniqueness is violated for index "idx": Index tid=(26,6) > posting 218 and posting 219 (point to heap tid=(126,93) and tid=(126,94)) > page lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,6) > posting 219 and posting 220 (point to heap tid=(126,94) and tid=(126,95)) > page lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,6) > posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96)) > page lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,6) > posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and > tid=(126,97)) page lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,7) > posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page > lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,7) > posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page > lsn=0/1B3D420. > > Things to notice in the test output: > - cross-page duplicates (when some of them on the one index page and the > other are on the next). (Under patch 0002 they are marked by an additional > message "*INFO: cross page equal keys"* to catch them among the other) > > - If we delete table entries corresponding to some duplicates in between > the other duplicates (vacuum should be off), the message for this entry > should disappear but the other duplicates should be detected as previous. > > *delete from tbl2 where ctid::text='(126,94)';* > *select bt_index_check('idx', true);* > > WARNING: index uniqueness is violated for index "idx": Index tid=(26,6) > posting 218 and posting 220 (point to heap tid=(126,93) and tid=(126,95)) > page lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,6) > posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96)) > page lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,6) > posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and > tid=(126,97)) page lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,7) > posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page > lsn=0/1B3D420. > WARNING: index uniqueness is violated for index "idx": Index tid=(26,7) > posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page > lsn=0/1B3D420. > > Caveat: if the first entry on the next index page has a key equal to the > key on a previous page AND all heap tid's corresponding to this entry are > invisible, currently cross-page check can not detect unique constraint > violation between previous index page entry and 2nd, 3d and next current > index page entries. In this case, there would be a message that recommends > doing VACUUM to remove the invisible entries from the index and repeat the > check. (Generally, it is recommended to do vacuum before the check, but for > the testing purpose I'd recommend turning it off to check the detection of > visible-invisible-visible duplicates scenarios) > > Your feedback is very much welcome, as usual. > > -- > Best regards, > Pavel Borisov > > Postgres Professional: http://postgrespro.com <http://www.postgrespro.com> >
There was typo, I mean, initially"...index will NOT let the user know that it is corrupted..."