I want to give some conclusion to our occurance of this, which now I think was neither an instance nor indicitive of any bug. Summary: postgres was being kill -9 by a deployment script, after it "timed out". Thus no log messages.
I initially experienced this while testing migration of a customer's DB using an ansible script, which did "async: 2222, poll: 0". Which I guess it takes to mean "let the process run for 2222 seconds, after which send it SIGKILL". I probably made multiple attempts to migrate (for example to fix tablespaces or dependences on postgis), and the process was started during one test and never stopped nor restarted during following tests, until it finally hit 2222sec and stopped ungracefully. This just happened again, during a differnet migration, so I reproduced it like: ansible --sudo --sudo-user pryzbyj 'database.customer.*' --background=1 --poll=1 -m command -a '/usr/pgsql-12/bin/postmaster -D /home/pryzbyj/pg12.dat -c port=5678 -c unix-socket-directories=/tmp -c client_min_messages=debug -c log_temp_files=0 -c log_lock_waits=1' Connect to customer and verify it was killed uncleanly: [pryzbyj@database ~]$ /usr/pgsql-12/bin/postmaster -D ./pg12.dat -c port=5678 -c unix-socket-directories=/tmp -c client_min_messages=debug -c log_temp_files=0 -c log_lock_waits=1 -c logging_collector=off 2019-10-22 17:57:58.251 EDT [5895] FATAL: lock file "postmaster.pid" already exists 2019-10-22 17:57:58.251 EDT [5895] HINT: Is another postmaster (PID 5608) running in data directory "/home/pryzbyj/./pg12.dat"? [pryzbyj@database ~]$ /usr/pgsql-12/bin/postmaster -D ./pg12.dat -c port=5678 -c unix-socket-directories=/tmp -c client_min_messages=debug -c log_temp_files=0 -c log_lock_waits=1 -c logging_collector=off 2019-10-22 17:58:01.312 EDT [5962] LOG: starting PostgreSQL 12.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2019-10-22 17:58:01.312 EDT [5962] LOG: listening on IPv6 address "::1", port 5678 2019-10-22 17:58:01.312 EDT [5962] LOG: listening on IPv4 address "127.0.0.1", port 5678 2019-10-22 17:58:01.328 EDT [5962] LOG: listening on Unix socket "/tmp/.s.PGSQL.5678" 2019-10-22 17:58:01.353 EDT [5963] LOG: database system was interrupted; last known up at 2019-10-22 17:57:48 EDT 2019-10-22 17:58:01.460 EDT [5963] LOG: database system was not properly shut down; automatic recovery in progress 2019-10-22 17:58:01.478 EDT [5963] LOG: invalid record length at 6/E829D128: wanted 24, got 0 2019-10-22 17:58:01.478 EDT [5963] LOG: redo is not required 2019-10-22 17:58:01.526 EDT [5962] LOG: database system is ready to accept connections On Tue, Jul 23, 2019 at 11:27:03AM -0500, Justin Pryzby wrote: > Does anyone have a stress test for parallel workers ? > > On a customer's new VM, I got this several times while (trying to) migrate > their DB: > > < 2019-07-23 10:33:51.552 CDT postgres >FATAL: postmaster exited during a > parallel transaction > < 2019-07-23 10:33:51.552 CDT postgres >STATEMENT: CREATE UNIQUE INDEX > unused0_huawei_umts_nodeb_locell_201907_unique_idx ON > child.unused0_huawei_umts_nodeb_locell_201907 USING btree ... > > There's nothing in dmesg nor in postgres logs. > > At first I thought it's maybe because of a network disconnection, then I > thought it's because we ran out of space (wal), then they had a faulty blade. > After that, I'd tried and failed to reproduce it a number of times, but it's > just recurred during what was intended to be their final restore. I've set > max_parallel_workers_per_gather=0, but I'm planning to try to diagnose an > issue > in another instance. Ideally a minimal test, since I'm apparently going to > have to run under gdb to see how it's dying, or even what process is failing. > > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, > BIOS 6.00 09/21/2015 > CentOS release 6.9 (Final) > Linux alextelsasrv01 2.6.32-754.17.1.el6.x86_64 #1 SMP Tue Jul 2 12:42:48 UTC > 2019 x86_64 x86_64 x86_64 GNU/Linux > version | PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 > 20120313 (Red Hat 4.4.7-23), 64-bit