Hi, On Mon, Feb 11, 2019 at 11:11:32AM +1100, Thomas Munro wrote: > I haven't ever managed to reproduce that one yet. It's great you have > a reliable repro... Let's discuss it on the #15585 thread.
I realized that I gave bad information (at least to Thomas). On the server where I've been reproducing this, it wasn't in an empty DB cluster, but one where I'd restored our DB schema. I think that's totally irrelevant, except that pg_attribute needs to be big enough to get parallel scan. Here's confirmed steps to reproduce initdb -D /var/lib/pgsql/test pg_ctl -c start -D /var/lib/pgsql/test -o '-c operator_precedence_warning=on -c maintenance_work_mem=1GB -c max_wal_size=16GB -c full_page_writes=off -c autovacuum=off -c fsync=off -c port=5678 -c unix_socket_directories=/tmp' PGPORT=5678 PGHOST=/tmp psql postgres -c 'CREATE TABLE queued_alters(child text,parent text); CREATE TABLE queued_alters_child()INHERITS(queued_alters); ANALYZE queued_alters, pg_attribute' # Inflate pg_attribute to nontrivial size: echo "CREATE TABLE t(`for c in $(seq 1 222); do echo "c$c int,"; done |xargs |sed 's/,$//'`)" |PGHOST=/tmp PGPORT=5678 psql postgres for a in `seq 1 999`; do echo "CREATE TABLE t$a() INHERITS(t);"; done |PGHOST=/tmp PGPORT=5678 psql -q postgres while PGOPTIONS='-cmin_parallel_table_scan_size=0' PGPORT=5678 PGHOST=/tmp psql postgres -c "explain analyze SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\3\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do :; done >/dev/null & # Verify this is planning parallel workers, then repeat 10-20x. Typically fails on this server in under 10min. Sorry for the error. Justin On Wed, Feb 06, 2019 at 07:47:19PM -0600, Justin Pryzby wrote: > FYI, I wasn't yet able to make this work yet. > (gdb) print *segment_map->header > Cannot access memory at address 0x7f347e554000 > > However I *did* reproduce the error in an isolated, non-production postgres > instance. It's a total empty, untuned v11.1 initdb just for this, running > ONLY > a few simultaneous loops around just one query It looks like the simultaneous > loops sometimes (but not always) fail together. This has happened a couple > times. > > It looks like one query failed due to "could not attach" in leader, one failed > due to same in worker, and one failed with "not pinned", which I hadn't seen > before and appears to be related to DSM, not DSA... > > |ERROR: dsa_area could not attach to segment > |ERROR: cannot unpin a segment that is not pinned > |ERROR: dsa_area could not attach to segment > |CONTEXT: parallel worker > | > |[2] Done while PGHOST=/tmp PGPORT=5678 psql postgres -c > "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY > colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) > ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute > colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT > colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM > pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND > colcld.attnum>0 AND NOT colcld.attisdropped WHERE > colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY > 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, > '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') > DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do > | :; > |done > /dev/null > |[5]- Done while PGHOST=/tmp PGPORT=5678 psql postgres -c > "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY > colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) > ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute > colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT > colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM > pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND > colcld.attnum>0 AND NOT colcld.attisdropped WHERE > colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY > 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, > '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') > DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do > | :; > |done > /dev/null > |[6]+ Done while PGHOST=/tmp PGPORT=5678 psql postgres -c > "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY > colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) > ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute > colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT > colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM > pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND > colcld.attnum>0 AND NOT colcld.attisdropped WHERE > colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY > 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, > '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') > DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do > > I'm also trying to reproduce on other production servers. But so far nothing > else has shown the bug, including the other server which hit our original > (other) DSA error with the queued_alters query. So I tentatively think there > really may be something specific to the server (not the hypervisor so maybe > the > OS, libraries, kernel, scheduler, ??). > > Find the schema for that table here: > https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com > > Note, for unrelated reasons, that query was also previously discussed here: > https://www.postgresql.org/message-id/20171110204043.GS8563%40telsasoft.com