Dan Wilder ([EMAIL PROTECTED]) reports a bug with a severity of 4 The lower the number the more severe it is.
Short Description Data in table causes server to terminate with signal 11 Long Description I have a table which brings about the error message (from syslog) server process (pid 21503) was terminated by signal 11 on any attempt to select certain fields from a certain record in a certain table. The application has been running since June with no such error indication. Problem cropped up some time during the night, possibly following a run of vacuumdb. The dump_db program fails a few records before the apparent problem record. I upgraded to a local build of postgresql-7.2.3 and found the same problem. Any field in the problem record up to a certain field can be selected. Any attempt to select the problem field or any field following it in the record yields the error message. Any other record in the table can be selected. vacuumdb from 7.2.3 runs successfully and gives no indication of anything amiss. After shutting down the database and saving a snapshot of the data, I restarted the 7.2.3 version and under psql deleted the problem record then reinserted it using copies of its values from an earlier snapshot of the database. All is fine now with the running copy. I still have the snapshot of the database files taken prior to the deletion, have verified that I can start a copy of postmaster running against this data, and that the problem can be reproduced. In the mailing list archives I find several references to "signal 11" with inconclusive resolutions, including hints about problem hardware. I do not believe hardware is the cause of this difficulty, because the application ran successfully for so long prior to this incident, because the problem is so perfectly reproducible, because the machine in question runs so smoothly otherwise, and has done many builds of kernels and other software, because the ECC RAM being used in the machine has completed 30 passes of memtest86, and because the problem went away when I deleted and then reinserted the offending record. The table in question has records that are written once and never updated, and the problem record was written in June. dump_db runs every night and has not previously aborted. Since deletion of the offending record, dump_db has run successfully, twice. Running Postgresql 7.2 on Debian Linux 3.0 (woody), default Debian install, when the problem was observed. Immediate clue to the problem was email from a cron job: From: [EMAIL PROTECTED] (Cron Daemon) To: [EMAIL PROTECTED] Subject: Cron <subs@wheel> /usr/lib/orcss/admin/dump.sh X-Cron-Env: <SHELL=/bin/sh> X-Cron-Env: <HOME=/home/subs> X-Cron-Env: <PATH=/usr/bin:/bin> X-Cron-Env: <LOGNAME=subs> Date: Mon, 14 Oct 2002 22:34:18 -0700 (PDT) pg_dump: connection not open pg_dump: lost synchronization with server, resetting connection pg_dump: SQL command to dump the contents of table "tender" failed: PQendcopy() +failed. pg_dump: Error message from server: FATAL 1: The database system is starting up pg_dump: The command was: COPY "tender" TO stdout; Corresponding syslog entries, wrapped (I hope) for readability: Oct 14 22:34:17 wheel postgres[505]: [1] DEBUG: server process (pid 19015) was terminated by signal 11 Oct 14 22:34:17 wheel postgres[505]: [2] DEBUG: terminating any other active server processes Oct 14 22:34:17 wheel postgres[505]: [3] DEBUG: all server processes terminated; reinitializing shared memory and semaphores Oct 14 22:34:17 wheel postgres[19039]: [4] DEBUG: database system was interrupted at 2002-10-14 22:31:39 PDT Oct 14 22:34:17 wheel postgres[19040]: [4] FATAL 1: The database system is starting up Oct 14 22:34:17 wheel postgres[19039]: [5] DEBUG: checkpoint record is at 4/B5F438B4 Oct 14 22:34:17 wheel postgres[19039]: [6] DEBUG: redo recordundo record is at 4/B5F438B4; undo record is at 0/0; shutdown FALSE Oct 14 22:34:17 wheel postgres[19039]: [7] DEBUG: next transaction id: 195169; next oid: 57807563 Oct 14 22:34:17 wheel postgres[19039]: [8] DEBUG: database system was not properly shut down; automatic recovery in progress Oct 14 22:34:17 wheel postgres[19039]: [9] DEBUG: ReadRecord: record with zero length at 4/B5F438F4 Oct 14 22:34:17 wheel postgres[19039]: [10] DEBUG: redo is not required Oct 14 22:34:21 wheel postgres[19039]: [11] DEBUG: database system is ready Attempting to run dump_db by hand, twice, yielded the same result, with the same last record dumped in each case. Later attempts at queries revealed server crashes at any attempt to query record with primary key (tender_id) 12776 involving any field appearing later in the schema than (including) the problem field. Typical syslog entries with debug level 4, timestamps and "DEBUG:" omitted: connection: host=[local] user=scott database=subs InitPostgres StartTransactionCommand query: select getdatabaseencoding() ProcessQuery CommitTransactionCommand StartTransactionCommand query: SELECT usesuper FROM pg_user WHERE usename = 'scott' ProcessQuery CommitTransactionCommand StartTransactionCommand query: select is_refund from tender where tender_id = 12776; ProcessQuery CommitTransactionCommand StartTransactionCommand query: select card_number from tender where tender_id = 12776; ProcessQuery CommitTransactionCommand StartTransactionCommand query: select verf_state from tender where tender_id = 12776; ProcessQuery reaping dead processes child process (pid 25363) was terminated by signal 11 server process (pid 25363) was terminated by signal 11 terminating any other active server processes all server processes terminated; reinitializing shared memory and semaphores shmem_exit(0) BackendStartup: forked pid=25371 socket=9 database system was interrupted at 2002-10-15 15:09:58 PDT checkpoint record is at 4/D02C89D8 [ etc ] If there is any interest in this problem and more information is wanted, I am willing to start up a copy of postmaster against the saved data for the purpose of gathering more information. Unfortunately I am not at liberty to provide a complete copy of the saved data. If anybody knows of a way to obtain data from the offending record I am willing to do so, and provide the results. Sample Code No file was uploaded with this report ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html