On 2022-Jan-10, Andrey Borodin wrote: > 3. The tuple seems to be updated in a high-contention concurrency > trigger function, autovacuum keeks in ~20-30 seconds after the message > in logs
Hmm, I bet this is related. > [ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 ]:ERROR: > t_xmin 696079792 is uncommitted in tuple (1419011,109) to be updated in > table "????s_statistics" > [ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 > ]:CONTEXT: SQL statement "UPDATE ????_????s.????s_statistics os > SET ????_????_found_ts = COALESCE(os.????_????_found_ts, NOW()), > last_????_found_ts = NOW(), > num_????s = os.num_????s + 1 > WHERE ????_id = NEW.????_id" > PL/pgSQL function statistics__update_from_new_????() line 3 at SQL > statement > [ 2022-01-10 09:07:17.671 MSK [unknown],????,????_????s,310759,XX001 > ]:STATEMENT: > INSERT INTO ????_????s.????s_????s AS ????s > > 4. t_xmin is relatevely new, not ancient > 5. pageinspect shows dead tuple after some time > 6. no suspicious activity in logs nearby > 7. vacuum (disable_page_skipping) and repack of indexes did not change > anything > > > I suspect this can be relatively new concurrency stuff. At least I never saw > this before on clusters with clean amcheck and heapcheck results. Ah. I've been thinking that it'd be some very old tuple that is in trouble, but that seems to be proven false. I think we should examine the affected tuples more closely while they're in the state that causes the problem. Can you set up things so that pageinspect's heap_page_items() is run on the broken page, before the problem disappears? Maybe extract the page number from the log line, have get_raw_page() store the page in a separate table, so that we can run heap_page_items() at leisure later. I would also grep through pg_waldump output to see what changes have been done to that tuple. Maybe the reason the problem fixes itself is that something else deletes the tuple. > Alvaro, did you observe this on binaries from August 13 minor release or > older? Well, the only reports I have of this problem are with the original error message that didn't give any clues at to what the problem was or where to look for it, so I don't know if the xmin was recent or not. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/