Re: Corrupt Index

2022-03-15 Thread Tom Lane
Jorel Casal writes: > During a Patroni leader switchover, we had a situation where a unique index > stopped "working", the index did not have records > that were in fact in the table and the table had duplicate records which > should have been prevented by said unique index. If the index was on a

Re: Corrupt index stopping autovacuum system wide

2019-07-18 Thread shauncutts
On 17.07.2019 18:14, Andres Freund wrote: To me that means that we need prioritization across databases, and between tables, and probably by multiple criteria. I suspect there need to be multiple criteria how urgent vacuuming is, and autovacuum ought to try to make progress on all of them.

Re: Corrupt index stopping autovacuum system wide

2019-07-18 Thread Peter Geoghegan
On Thu, Jul 18, 2019 at 9:06 AM Aaron Pelz wrote: > It's a simple btree expression on a geometry(Point,4326) , no expression no > partial no composite. The cause of the corruption may be a bug in a Postgis B-Tree operator class. I reported a bug in the Geography type that could lead to corrupt B

Re: Corrupt index stopping autovacuum system wide

2019-07-18 Thread Aaron Pelz
> Can you tell us more about this index? Can you share its definition > (i.e. what does \d show in psql)? > Is it an expression index, or a partial index? A composite? What > datatypes are indexed? It's a simple btree expression on a geometry(Point,4326) , no expression no partial no composite.

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Andres Freund
Hi, On 2019-07-17 13:27:23 -0400, Tom Lane wrote: > My concern here is that if we have blinders on to the extent of only > processing that one table or DB, we're unnecessarily allowing bloat to > occur in other tables, and causing that missed vacuuming work to pile > up so that there's more of it

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 10:27 AM Peter Geoghegan wrote: > > It's possible that amcheck would have given you an accurate diagnosis > > of the problem -- especially if you used bt_index_parent_check(): > > > > https://www.postgresql.org/docs/current/amcheck.html > > BTW, be sure to use the 'heapalli

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 11:43 AM Alvaro Herrera wrote: > This might make things worse operationally, though. If searches aren't > failing but vacuum is, we'd break a production system that currently > works. If searches aren't failing and VACUUM works, then that's probably down to dumb luck. The

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Alvaro Herrera
On 2019-Jul-17, Tom Lane wrote: > Alvaro Herrera writes: > > On 2019-Jul-17, Peter Geoghegan wrote: > >> Maybe nbtree VACUUM should do something more aggressive than give up > >> when there is a "failed to re-find parent key" or similar condition. > >> Perhaps it would make more sense to make the

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Tom Lane
Alvaro Herrera writes: > On 2019-Jul-17, Peter Geoghegan wrote: >> Maybe nbtree VACUUM should do something more aggressive than give up >> when there is a "failed to re-find parent key" or similar condition. >> Perhaps it would make more sense to make the index inactive (for some >> value of "inac

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Alvaro Herrera
On 2019-Jul-17, Peter Geoghegan wrote: > Maybe nbtree VACUUM should do something more aggressive than give up > when there is a "failed to re-find parent key" or similar condition. > Perhaps it would make more sense to make the index inactive (for some > value of "inactive") instead of just compla

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 10:27 AM Tom Lane wrote: > Right, you're eventually going to get to a forced shutdown if vacuum never > succeeds on one table; no question that that's bad. It occurs to me that we use operator class/insertion scankey comparisons within page deletion, to relocate a leaf pag

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Jul 17, 2019 at 9:57 AM Tom Lane wrote: >> It looks that way, but how would a broken non-shared index have held up >> autovacuuming in other databases? Maybe, as this one's xmin horizon >> got further and further behind, the launcher eventually stopped >> consid

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 9:21 AM Peter Geoghegan wrote: > It's possible that amcheck would have given you an accurate diagnosis > of the problem -- especially if you used bt_index_parent_check(): > > https://www.postgresql.org/docs/current/amcheck.html BTW, be sure to use the 'heapallindexed' opti

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 9:57 AM Tom Lane wrote: > It looks that way, but how would a broken non-shared index have held up > autovacuuming in other databases? Maybe, as this one's xmin horizon > got further and further behind, the launcher eventually stopped > considering launching workers into an

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Jul 17, 2019 at 8:54 AM Aaron Pelz wrote: >> To me it looks like a *single* corrupt index held up autovacuums across our >> entire server, even other in other databases on the same server. Am I >> interpreting this correctly? > Yes -- that is correct. It look

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Aaron Pelz
> What PostgreSQL version are you on? Was this an INCLUDE index on PostgreSQL 11? On 11, and no it was just a normal btree. > It's possible that amcheck would have given you an accurate diagnosis > of the problem -- especially if you used bt_index_parent_check(): I'll look into this, seems helpf

Re: Corrupt index stopping autovacuum system wide

2019-07-17 Thread Peter Geoghegan
On Wed, Jul 17, 2019 at 8:54 AM Aaron Pelz wrote: > To me it looks like a *single* corrupt index held up autovacuums across our > entire server, even other in other databases on the same server. Am I > interpreting this correctly? Yes -- that is correct. What PostgreSQL version are you on? Was