You might want to try pg replay: http://laurenz.github.io/pgreplay/

On Thu, Jul 30, 2015 at 7:23 AM, Spiros Ioannou <siv...@inaccess.com> wrote:

> I'm very sorry but we don't have a synthetic load generator for our
> testing setup, only production and that is on SLA. I would be happy to test
> the next release though.
>
>
>
>
>
>
>
>
> *Spiros Ioannou IT Manager, inAccesswww.inaccess.com
> <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358*
>
> On 29 July 2015 at 13:42, Heikki Linnakangas <hlinn...@iki.fi> wrote:
>
>> On 07/28/2015 11:36 PM, Heikki Linnakangas wrote:
>>
>>> A-ha, I succeeded to reproduce this now on my laptop, with pgbench! It
>>> seems to be important to have a very large number of connections:
>>>
>>> pgbench -n -c400 -j4 -T600 -P5
>>>
>>> That got stuck after a few minutes. I'm using commit_delay=100.
>>>
>>> Now that I have something to work with, I'll investigate this more
>>> tomorrow.
>>>
>>
>> Ok, it seems that this is caused by the same issue that I found with my
>> synthetic test case, after all. It is possible to get a lockup because of
>> it.
>>
>> For the archives, here's a hopefully easier-to-understand explanation of
>> how the lockup happens. It involves three backends. A and C are insertion
>> WAL records, while B is flushing the WAL with commit_delay. The byte
>> positions 2000, 2100, 2200, and 2300 are offsets within a WAL page. 2000
>> points to the beginning of the page, while the others are later positions
>> on the same page. WaitToFinish() is an abbreviation for
>> WaitXLogInsertionsToFinish(). "Update pos X" means a call to
>> WALInsertLockUpdateInsertingAt(X). "Reserve A-B" means a call to
>> ReserveXLogInsertLocation, which returned StartPos A and EndPos B.
>>
>> Backend A               Backend B               Backend C
>> ---------               ---------               ---------
>> Acquire InsertLock 2
>> Reserve 2100-2200
>>                         Calls WaitToFinish()
>>                           reservedUpto is 2200
>>                           sees that Lock 1 is
>>                           free
>>                                                 Acquire InsertLock 1
>>                                                 Reserve 2200-2300
>>                                                 GetXLogBuffer(2200)
>>                                                  page not in cache
>>                                                  Update pos 2000
>>                                                  AdvanceXLInsertBuffer()
>>                                                   run until about to
>>                                                   acquire WALWriteLock
>> GetXLogBuffer(2100)
>>  page not in cache
>>  Update pos 2000
>>  AdvanceXLInsertBuffer()
>>   Acquire WALWriteLock
>>   write out old page
>>   initialize new page
>>   Release WALWriteLock
>> finishes insertion
>> release InsertLock 2
>>                         WaitToFinish() continues
>>                           sees that lock 2 is
>>                           free. Returns 2200.
>>
>>                         Acquire WALWriteLock
>>                         Call WaitToFinish(2200)
>>                           blocks on Lock 1,
>>                           whose initializedUpto
>>                           is 2000.
>>
>> At this point, there is a deadlock between B and C. B is waiting for C to
>> release the lock or update its insertingAt value past 2200, while C is
>> waiting for WALInsertLock, held by B.
>>
>> To fix that, let's fix GetXLogBuffer() to always advertise the exact
>> position, not the beginning of the page (except when inserting the first
>> record on the page, just after the page header, see comments).
>>
>> This fixes the problem for me. I've been running pgbench for about 30
>> minutes without lockups now, while without the patch it locked up within a
>> couple of minutes. Spiros, can you easily test this patch in your
>> environment? Would be nice to get a confirmation that this fixes the
>> problem for you too.
>>
>> - Heikki
>>
>>
>


-- 
To understand recursion, one must first understand recursion.

Reply via email to