> On 29 Jun 2022, at 17:43, Robins Tharakan <thara...@gmail.com> wrote:
Sorry to bump ancient thread, I have some observations that might or might not
be relevant.
Recently we noticed a corruption on one of clusters. The corruption at hand is
not in system catalog, but in user indexes.
The cluster was correctly configured: checksums, fsync, FPI etc.
The cluster never was restored from a backup. It’s a single-node cluster, so it
was not ever promoted, pg_rewind-ed etc. VM had never been rebooted.
But, the cluster had been experiencing 10 OOMs a day. There were no torn pages,
no checsum erros at log at all. Yet, B-tree indexes became corrupted.
Sorry for this wall of text, I’m posing everything as-is in case if there is
some useful information.
$ /etc/cron.yandex/pg_corruption_check.py --index
2024-03-01 11:54:05,075 ERROR : Corrupted index: 96009
table1_table1message_table1_team_identity_06a95642 XX002 ERROR: posting list
contains misplaced TID in index
"table1_table1message_table1_team_identity_06a95642" DETAIL: Index tid=(267,34)
posting list offset=137 page lsn=31B/62159608.
2024-03-01 11:54:05,100 ERROR : Corrupted index: 96008
table1_table1message_organization_id_66c18ed2 XX002 ERROR: posting list
contains misplaced TID in index "table1_table1message_organization_id_66c18ed2"
DETAIL: Index tid=(267,34) posting list offset=137 page lsn=31B/62158BC8.
2024-03-01 11:54:05,355 ERROR : Corrupted index: 95804
table2_aler_channel_81aeec_idx XX002 ERROR: posting list contains misplaced TID
in index "table2_aler_channel_81aeec_idx" DETAIL: Index tid=(336,7) posting
list offset=182 page lsn=314/9B794248.
2024-03-01 11:54:05,716 ERROR : Corrupted index: 95816
table2_table3_channel_id_91a1912f XX002 ERROR: posting list contains misplaced
TID in index "table2_table3_channel_id_91a1912f" DETAIL: Index tid=(384,2)
posting list offset=72 page lsn=317/3F14F390.
2024-03-01 11:54:06,068 ERROR : Corrupted index: 95815
table2_table3_channel_filter_id_6706c8b6 XX002 ERROR: posting list contains
misplaced TID in index "table2_table3_channel_filter_id_6706c8b6" DETAIL: Index
tid=(380,2) posting list offset=72 page lsn=317/3F0D8E30.
2024-03-01 11:54:06,302 ERROR : Corrupted index: 95824
table2_table3_root_alert_group_id_f327f122 XX002 ERROR: item order invariant
violated for index "table2_table3_root_alert_group_id_f327f122" DETAIL: Lower
index tid=(368,204) (points to heap tid=(48901,2)) higher index tid=(368,205)
(points to heap tid=(48901,2)) page lsn=319/3C234588.
2024-03-01 11:54:06,538 ERROR : Corrupted index: 95810
table2_table3_acknowledged_by_user_id_dd6723dc XX002 ERROR: posting list
contains misplaced TID in index
"table2_table3_acknowledged_by_user_id_dd6723dc" DETAIL: Index tid=(380,69)
posting list offset=35 page lsn=317/C14E2D50.
2024-03-01 11:54:06,775 ERROR : Corrupted index: 95825
table2_table3_silenced_by_user_id_40a833a1 XX002 ERROR: posting list contains
misplaced TID in index "table2_table3_silenced_by_user_id_40a833a1" DETAIL:
Index tid=(371,11) posting list offset=144 page lsn=318/61171918.
2024-03-01 11:54:07,009 ERROR : Corrupted index: 95829
table2_table3_wiped_by_id_4326ff61 XX002 ERROR: item order invariant violated
for index "table2_table3_wiped_by_id_4326ff61" DETAIL: Lower index tid=(373,97)
(points to heap tid=(48901,2)) higher index tid=(373,98) (points to heap
tid=(48901,2)) page lsn=318/61172788.
2024-03-01 11:54:07,245 ERROR : Corrupted index: 95823
table2_table3_resolved_by_user_id_463cdf3d XX002 ERROR: posting list contains
misplaced TID in index "table2_table3_resolved_by_user_id_463cdf3d" DETAIL:
Index tid=(375,89) posting list offset=144 page lsn=319/3C1DCFC8.
2024-03-01 11:54:07,479 ERROR : Corrupted index: 95819
table2_table3_maintenance_uuid_9a7b8529_like XX002 ERROR: item order invariant
violated for index "table2_table3_maintenance_uuid_9a7b8529_like" DETAIL: Lower
index tid=(372,4) (points to heap tid=(48901,2)) higher index tid=(372,5)
(points to heap tid=(48901,2)) page lsn=317/C1A210A8.
2024-03-01 11:54:07,717 ERROR : Corrupted index: 95827
table2_table3_table1_message_id_58a31784_like XX002 ERROR: posting list
contains misplaced TID in index "table2_table3_table1_message_id_58a31784_like"
DETAIL: Index tid=(373,89) posting list offset=144 page lsn=319/3C3EE660.
2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066
webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order invariant
violated for index "webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower
index tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25)
(points to heap tid=(73981,1)) page lsn=31B/E522B640.
2024-03-01 11:54:08,646 ERROR : Corrupted index: 95822
table2_table3_resolved_by_alert_id_bbdf0a83 XX002 ERROR: posting list contains
misplaced TID in index "table2_table3_resolved_by_alert_id_bbdf0a83" DETAIL:
Index tid=(618,2) posting list offset=150 page lsn=317/C1DE74B8.
2024-03-01 11:54:08,873 ERROR : Corrupted index: 95427
table2_table3_table1_message_id_key XX002 ERROR: item order invariant violated
for index "table2_table3_table1_message_id_key" DETAIL: Lower index
tid=(369,134) (points to heap tid=(48901,2)) higher index tid=(369,135) (points
to heap tid=(48901,2)) page lsn=319/3B629E58.
2024-03-01 11:54:09,108 ERROR : Corrupted index: 95417
table2_table3_maintenance_uuid_key XX002 ERROR: posting list contains misplaced
TID in index "table2_table3_maintenance_uuid_key" DETAIL: Index tid=(371,42)
posting list offset=47 page lsn=318/6116FC50.
2024-03-01 11:54:10,180 ERROR : Corrupted index: 95826
table2_table3_table1_log_message_id_587aaa8d_like XX002 ERROR: posting list
contains misplaced TID in index
"table2_table3_table1_log_message_id_587aaa8d_like" DETAIL: Index tid=(849,19)
posting list offset=79 page lsn=319/3C389B60.
2024-03-01 11:54:10,689 ERROR : Corrupted index: 95820
table2_table3_mattermost_log_message_id_69bc2ae4_like XX002 ERROR: item order
invariant violated for index
"table2_table3_mattermost_log_message_id_69bc2ae4_like" DETAIL: Lower index
tid=(559,4) (points to heap tid=(48901,2)) higher index tid=(559,5) (points to
heap tid=(48901,2)) page lsn=317/C1A7BA50.
2024-03-01 11:54:11,760 ERROR : Corrupted index: 95425
table2_table3_table1_log_message_id_key XX002 ERROR: item order invariant
violated for index "table2_table3_table1_log_message_id_key" DETAIL: Lower
index tid=(849,22) (points to heap tid=(48901,2)) higher index tid=(849,23)
(points to heap tid=(48901,2)) page lsn=317/3E7EC1F0.
2024-03-01 11:54:12,282 ERROR : Corrupted index: 95419
table2_table3_mattermost_log_message_id_key XX002 ERROR: posting list contains
misplaced TID in index "table2_table3_mattermost_log_message_id_key" DETAIL:
Index tid=(566,84) posting list offset=65 page lsn=319/3B1901F8.
2024-03-01 11:54:17,990 ERROR : Corrupted index: 95423
table2_table3_public_primary_key_key XX002 ERROR: cross page item order
invariant violated for index "table2_table3_public_primary_key_key" DETAIL:
Last item on page tid=(727,146) page lsn=31B/E104D660.
Most of these messages look similar, except last one: “cross page item order
invariant violated for index”. Indeed, index scans were hanging in a cycle.
I could not locate problem in WAL yet, because a lot of other stuff is going
on. But I have no other ideas, but suspect that posting list redo is corrupting
index in case of a crash.
Thanks!
Best regards, Andrey Borodin.