That would need a replica of the data probably which is not possible (tablespace is 4TB).
*Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 30 July 2015 at 21:47, Scott Marlowe <scott.marl...@gmail.com> wrote: > You might want to try pg replay: http://laurenz.github.io/pgreplay/ > > On Thu, Jul 30, 2015 at 7:23 AM, Spiros Ioannou <siv...@inaccess.com> > wrote: > >> I'm very sorry but we don't have a synthetic load generator for our >> testing setup, only production and that is on SLA. I would be happy to test >> the next release though. >> >> >> >> >> >> >> >> >> *Spiros Ioannou IT Manager, inAccesswww.inaccess.com >> <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* >> >> On 29 July 2015 at 13:42, Heikki Linnakangas <hlinn...@iki.fi> wrote: >> >>> On 07/28/2015 11:36 PM, Heikki Linnakangas wrote: >>> >>>> A-ha, I succeeded to reproduce this now on my laptop, with pgbench! It >>>> seems to be important to have a very large number of connections: >>>> >>>> pgbench -n -c400 -j4 -T600 -P5 >>>> >>>> That got stuck after a few minutes. I'm using commit_delay=100. >>>> >>>> Now that I have something to work with, I'll investigate this more >>>> tomorrow. >>>> >>> >>> Ok, it seems that this is caused by the same issue that I found with my >>> synthetic test case, after all. It is possible to get a lockup because of >>> it. >>> >>> For the archives, here's a hopefully easier-to-understand explanation of >>> how the lockup happens. It involves three backends. A and C are insertion >>> WAL records, while B is flushing the WAL with commit_delay. The byte >>> positions 2000, 2100, 2200, and 2300 are offsets within a WAL page. 2000 >>> points to the beginning of the page, while the others are later positions >>> on the same page. WaitToFinish() is an abbreviation for >>> WaitXLogInsertionsToFinish(). "Update pos X" means a call to >>> WALInsertLockUpdateInsertingAt(X). "Reserve A-B" means a call to >>> ReserveXLogInsertLocation, which returned StartPos A and EndPos B. >>> >>> Backend A Backend B Backend C >>> --------- --------- --------- >>> Acquire InsertLock 2 >>> Reserve 2100-2200 >>> Calls WaitToFinish() >>> reservedUpto is 2200 >>> sees that Lock 1 is >>> free >>> Acquire InsertLock 1 >>> Reserve 2200-2300 >>> GetXLogBuffer(2200) >>> page not in cache >>> Update pos 2000 >>> AdvanceXLInsertBuffer() >>> run until about to >>> acquire WALWriteLock >>> GetXLogBuffer(2100) >>> page not in cache >>> Update pos 2000 >>> AdvanceXLInsertBuffer() >>> Acquire WALWriteLock >>> write out old page >>> initialize new page >>> Release WALWriteLock >>> finishes insertion >>> release InsertLock 2 >>> WaitToFinish() continues >>> sees that lock 2 is >>> free. Returns 2200. >>> >>> Acquire WALWriteLock >>> Call WaitToFinish(2200) >>> blocks on Lock 1, >>> whose initializedUpto >>> is 2000. >>> >>> At this point, there is a deadlock between B and C. B is waiting for C >>> to release the lock or update its insertingAt value past 2200, while C is >>> waiting for WALInsertLock, held by B. >>> >>> To fix that, let's fix GetXLogBuffer() to always advertise the exact >>> position, not the beginning of the page (except when inserting the first >>> record on the page, just after the page header, see comments). >>> >>> This fixes the problem for me. I've been running pgbench for about 30 >>> minutes without lockups now, while without the patch it locked up within a >>> couple of minutes. Spiros, can you easily test this patch in your >>> environment? Would be nice to get a confirmation that this fixes the >>> problem for you too. >>> >>> - Heikki >>> >>> >> > > > -- > To understand recursion, one must first understand recursion. >