Stop the autovacuum process and try again.
On Tue, Jun 18, 2013 at 1:31 PM, bhanu udaya <udayabhanu1...@hotmail.com>wrote: > Hello, > Greetings. > > My PostgresSQL (9.2) is crashing after certain load tests. Currently, > postgressql is crashing when simulatenously 800 to 1000 threads are run on > a 10 million records schema. Not sure, if we have to tweak some more > parameters of postgres. Currently, the postgressql is configured as below > on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres > limitation to support only 800 threads or any other configuration required. > Please look at the log as below with errors. Please reply > > > max_connections 5000 shared_buffers 2024 MB synchronous_commit off > wal_buffers 100 MB wal_writer_delays 1000ms checkpoint_segments 512 > checkpoint_timeout 5 min checkpoint_completion_target 0.5 > checkpoint_warning 30s work_memory 1G effective_cache_size 5 GB > > > > 2013-06-11 15:11:17 GMT [26201]: [1-1]ERROR: canceling autovacuum task > > 2013-06-11 15:11:17 GMT [26201]: [2-1]CONTEXT: automatic vacuum of table > "newrelic.tenant1.customer" > > 2013-06-11 15:11:17 GMT [25242]: [1-1]LOG: sending cancel to blocking > autovacuum PID 26201 > > 2013-06-11 15:11:17 GMT [25242]: [2-1]DETAIL: Process 25242 waits for > ExclusiveLock on extension of relation 679054 of database 666546. > > 2013-06-11 15:11:17 GMT [25242]: [3-1]STATEMENT: UPDATE tenant1.customer > SET lastmodifieddate = $1 WHERE id IN ( select random_range((select min(id) > from tenant1.customer ), (select max(id) from tenant1.customer )) as id ) > AND softdeleteflag IS NOT TRUE > > 2013-06-11 15:11:17 GMT [25242]: [4-1]WARNING: could not send signal to > process 26201: No such process > > 2013-06-11 15:22:29 GMT [22229]: [11-1]WARNING: worker took too long to > start; canceled > > 2013-06-11 15:24:10 GMT [26511]: [1-1]WARNING: autovacuum worker started > without a worker entry > > 2013-06-11 16:03:33 GMT [23092]: [1-1]LOG: could not receive data from > client: Connection timed out > > 2013-06-11 16:06:05 GMT [23222]: [5-1]LOG: could not receive data from > client: Connection timed out > > 2013-06-11 16:07:06 GMT [26869]: [1-1]FATAL: canceling authentication due > to timeout > > 2013-06-11 16:23:16 GMT [25128]: [1-1]LOG: could not receive data from > client: Connection timed out > > 2013-06-11 16:23:20 GMT [25128]: [2-1]LOG: unexpected EOF on client > connection with an open transaction > > 2013-06-11 16:30:56 GMT [23695]: [1-1]LOG: could not receive data from > client: Connection timed out > > 2013-06-11 16:43:55 GMT [24618]: [1-1]LOG: could not receive data from > client: Connection timed out > > 2013-06-11 16:44:29 GMT [25204]: [1-1]LOG: could not receive data from > client: Connection timed out > > 2013-06-11 16:54:14 GMT [22226]: [1-1]PANIC: stuck spinlock > (0x2aaab54279d4) detected at bufmgr.c:1239 > > 2013-06-11 16:54:14 GMT [32521]: [8-1]LOG: checkpointer process (PID > 22226) was terminated by signal 6: Aborted > > 2013-06-11 16:54:14 GMT [32521]: [9-1]LOG: terminating any other active > server processes > > 2013-06-11 16:54:14 GMT [26931]: [1-1]WARNING: terminating connection > because of crash of another server process > > 2013-06-11 16:54:14 GMT [26931]: [2-1]DETAIL: The postmaster has commanded > this server process to roll back the current transaction and exit, because > another server process exited abnormally and possibly corrupted shared > memory. > > 2013-06-11 16:54:14 GMT [26931]: [3-1]HINT: In a moment you should be able > to reconnect to the database and repeat your command. > > 2013-06-11 16:54:14 GMT [26401]: [1-1]WARNING: terminating connection > because of crash of another server process > > 2013-06-11 16:54:14 GMT [26401]: [2-1]DETAIL: The postmaster has commanded > this server process to roll back the current transaction and exit, because > another server process exited abnormally and possibly corrupted shared > memory. > > 2013-06-11 16:55:08 GMT [27579]: [1-1]FATAL: the database system is in > recovery mode > > 2013-06-11 16:55:08 GMT [24041]: [1-1]WARNING: terminating connection > because of crash of another server process > > 2013-06-11 16:55:08 GMT [24041]: [2-1]DETAIL: The postmaster has commanded > this server process to roll back the current > >