Justin Pryzby <pry...@telsasoft.com> writes:
> This commit seems to trigger elog(), not reproducible in the
> parent commit.

> 6e086fa2e77 Allow parallel workers to cope with a newly-created session user 
> ID.

> postgres=# SET min_parallel_table_scan_size=0; CLUSTER pg_attribute USING 
> pg_attribute_relid_attnum_index;
> ERROR:  pg_attribute catalog is missing 26 attribute(s) for relation OID 70321

I've been poking at this all day, and I still have little idea what's
going on.  I've added a bunch of throwaway instrumentation, and have
managed to convince myself that the problem is that parallel heap
scan is broken.  The scans done to rebuild pg_attribute's indexes
seem to sometimes miss heap pages or visit pages twice (in different
workers).  I have no idea why this is, and even less idea how
6e086fa2e is provoking it.  As you say, the behavior isn't entirely
reproducible, but I couldn't make it happen at all after reverting
6e086fa2e's changes in transam/parallel.c, so apparently there is
some connection.

Another possibly useful data point is that for me it reproduces
fairly well (more than one time in two) on x86_64 Linux, but
I could not make it happen on macOS ARM64.  If it's a race
condition, which smells plausible, that's perhaps not hugely
surprising.

                        regards, tom lane


Reply via email to