In our query logs I saw:
postgres=# SELECT log_time, session_id, session_line, left(message,99),
left(query,99) FROM postgres_log WHERE error_severity='ERROR' AND message NOT
LIKE 'cancel%';
-[ RECORD 1
]+----------------------------------------------------------------------------------------------------
log_time | 2018-12-31 15:39:11.917-05
session_id | 5c2a7e6f.1fa4
session_line | 1
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text
ORDER BY colpar.attnum) cols, array
-[ RECORD 2
]+----------------------------------------------------------------------------------------------------
log_time | 2018-12-31 15:39:11.917-05
session_id | 5c2a7e6f.1fa3
session_line | 4
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text
ORDER BY colpar.attnum) cols, array
The full query + plan is:
|ts=# explain SELECT colcld.child c, parent p, array_agg(colpar.attname::text
ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid,
colpar.atttypmod) ORDER BY colpar.attnum) AS types
|FROM queued_alters qa
|JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND
colpar.attnum>0 AND NOT colpar.attisdropped
|JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON
to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT
colcld.attisdropped
|WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
|GROUP BY 1,2
|ORDER BY
|parent LIKE 'unused%', -- Do them last
|regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\3\5') DESC,
| regexp_replace(colcld.child, '.*_', '') DESC
|LIMIT 1;
|QUERY PLAN
|Limit (cost=67128.06..67128.06 rows=1 width=307)
| -> Sort (cost=67128.06..67137.84 rows=3912 width=307)
| Sort Key: (((qa_1.parent)::text ~~ 'unused%'::text)),
(regexp_replace((((pg_attribute.attrelid)::regclass)::text),
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$'::text,
'\3\5'::text)) DESC,
(regexp_replace((((pg_attribute.attrelid)::regclass)::text), '.*_'::text,
''::text)) DESC
| -> GroupAggregate (cost=66893.34..67108.50 rows=3912 width=307)
| Group Key: (((pg_attribute.attrelid)::regclass)::text),
qa_1.parent
| -> Sort (cost=66893.34..66903.12 rows=3912 width=256)
| Sort Key: (((pg_attribute.attrelid)::regclass)::text),
qa_1.parent
| -> Gather (cost=40582.28..66659.91 rows=3912 width=256)
| Workers Planned: 2
| -> Parallel Hash Join (cost=39582.28..65268.71
rows=1630 width=256)
| Hash Cond:
(((to_regclass((qa_1.child)::text))::oid = pg_attribute.attrelid) AND
(colpar.attname = pg_attribute.attname))
| Join Filter: (colpar.atttypid <>
pg_attribute.atttypid)
| -> Nested Loop (cost=0.43..25614.89
rows=11873 width=366)
| -> Parallel Append (cost=0.00..12.00
rows=105 width=292)
| -> Parallel Seq Scan on
queued_alters_child qa_1 (cost=0.00..11.47 rows=147 width=292)
| -> Parallel Seq Scan on
queued_alters qa (cost=0.00..0.00 rows=1 width=292)
| -> Index Scan using
pg_attribute_relid_attnum_index on pg_attribute colpar (cost=0.43..242.70
rows=114 width=78)
| Index Cond: ((attrelid =
(to_regclass((qa_1.parent)::text))::oid) AND (attnum > 0))
| Filter: (NOT attisdropped)
| -> Parallel Hash (cost=33651.38..33651.38
rows=395365 width=72)
| -> Parallel Seq Scan on pg_attribute
(cost=0.00..33651.38 rows=395365 width=72)
| Filter: ((NOT attisdropped) AND
(attnum > 0))
|
queued_alters is usually empty, and looks like it would've last been nonempty
on 2018-12-10.
ts=# \d queued_alters
Table "public.queued_alters"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
child | character varying(64) | | |
parent | character varying(64) | | |
Indexes:
"queued_alters_child_parent_key" UNIQUE CONSTRAINT, btree (child, parent)
Number of child tables: 1 (Use \d+ to list them.)
I found this other log at that time:
2018-12-31 15:39:11.918-05 | 30831 | 5bf38e71.786f | 5 | background
worker "parallel worker" (PID 8100) exited with exit code 1
Which is the postmaster, or its PID in any case..
[pryzbyj@telsasoft-db ~]$ ps -wwwf 30831
UID PID PPID C STIME TTY STAT TIME CMD
postgres 30831 1 0 Nov19 ? S 62:44 /usr/pgsql-11/bin/postmaster
-D /var/lib/pgsql/11/data
postgres=# SELECT log_time, pid, session_line, left(message,99) FROM
postgres_log WHERE session_id='5bf38e71.786f' ;
log_time | pid | session_line |
left
----------------------------+-------+--------------+-------------------------------------------------------------------------
2018-12-31 15:39:11.918-05 | 30831 | 5 | background worker
"parallel worker" (PID 8100) exited with exit code 1
2018-12-31 15:39:11.935-05 | 30831 | 6 | background worker
"parallel worker" (PID 8101) exited with exit code 1
2018-12-31 16:40:47.42-05 | 30831 | 7 | background worker
"parallel worker" (PID 7239) exited with exit code 1
2018-12-31 16:40:47.42-05 | 30831 | 8 | background worker
"parallel worker" (PID 7240) exited with exit code 1
2018-12-31 16:41:00.151-05 | 30831 | 9 | background worker
"parallel worker" (PID 7371) exited with exit code 1
2018-12-31 16:41:00.151-05 | 30831 | 10 | background worker
"parallel worker" (PID 7372) exited with exit code 1
2018-12-31 16:41:04.024-05 | 30831 | 11 | background worker
"parallel worker" (PID 7451) exited with exit code 1
2018-12-31 16:41:04.024-05 | 30831 | 12 | background worker
"parallel worker" (PID 7450) exited with exit code 1
2018-12-31 16:41:23.845-05 | 30831 | 13 | background worker
"parallel worker" (PID 7658) exited with exit code 1
2018-12-31 16:41:23.845-05 | 30831 | 14 | background worker
"parallel worker" (PID 7659) exited with exit code 1
2018-12-31 16:43:58.854-05 | 30831 | 15 | background worker
"parallel worker" (PID 10825) exited with exit code 1
2018-12-31 16:43:58.854-05 | 30831 | 16 | background worker
"parallel worker" (PID 10824) exited with exit code 1
I seem to be missing logs for session_lines 1-4 , which would've been rotated
to oblivion if older than 24h.
I found these:
https://www.postgresql.org/message-id/flat/CAPa4P2YzgRTBHHRx2KAPUO1_xkmqQgT2xP0tS_e%3DphWvNzWdkA%40mail.gmail.com#4a1a94bc71d4c99c01babc759fe5b6ec
https://www.postgresql.org/message-id/CAEepm=0mv9bigjppribgqhnhqvgyo2+kmzekguvjjc9y_zv...@mail.gmail.com
Which appears to have been commited so I think this error is not unexpected.
|commit fd7c0fa732d97a4b4ebb58730e6244ea30d0a618
|Author: Robert Haas <[email protected]>
|Date: Mon Dec 18 12:17:37 2017 -0500
|
|Fix crashes on plans with multiple Gather (Merge) nodes.
I will try to reproduce and provide bt...but all I can think to do is run a
tight loop around that query and hope something else comes by and tickles it a
few more times.
Happy Grecian newyear.
Justin