Hi Andrey,

> I think if we really want to fix exclusive SubtransSLRULock I think best 
> option would be to split SLRU control lock into array of locks
 I agree with you. If we can resolve the performance issue with this approach, 
It should be a good solution.

> one for each bank (in 
> v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch)
 I have tested with this patch. And I have modified NUM_SUBTRANS_BUFFERS to 
128.  With 500 concurrence,  it would not be stuck indeed. But the performance 
is very bad. For a sequence scan table, it uses more than one minute.
I think it is unacceptable in a production environment.

postgres=# select count(*) from contend ;
 count 
-------
 10127
(1 row)

Time: 86011.593 ms (01:26.012)
postgres=# select count(*) from contend ;
 count 
-------
 10254
(1 row)
Time: 79399.949 ms (01:19.400)


With my local subtrans optimize approach, the same env and the same test script 
and 500 concurrence, a sequence scan, it uses only less than 10 seconds.

postgres=# select count(*) from contend ;
 count 
-------
 10508
(1 row)

Time: 7104.283 ms (00:07.104)

postgres=# select count(*) from contend ;
count 
-------
 13175
(1 row)

Time: 6602.635 ms (00:06.603)
Thanks
Pengcheng

-----Original Message-----
From: Andrey Borodin <x4...@yandex-team.ru> 
Sent: 2021年9月3日 14:51
To: Pengchengliu <pengcheng...@tju.edu.cn>
Cc: pgsql-hack...@postgresql.org
Subject: Re: suboverflowed subtransactions concurrency performance optimize

Sorry, for some reason Mail.app converted message to html and mailing list 
mangled this html into mess. I'm resending previous message as plain text 
again. Sorry for the noise.

> 31 авг. 2021 г., в 11:43, Pengchengliu <pengcheng...@tju.edu.cn> написал(а):
> 
> Hi Andrey,
>  Thanks a lot for your replay and reference information.
> 
>  The default NUM_SUBTRANS_BUFFERS is 32. My implementation is 
> local_cache_subtrans_pages can be adjusted dynamically.
>  If we configure local_cache_subtrans_pages as 64, every backend use only 
> extra 64*8192=512KB memory. 
>  So the local cache is similar to the first level cache. And subtrans SLRU is 
> the second level cache.
>  And I think extra memory is very well worth it. It really resolve massive 
> subtrans stuck issue which I mentioned in previous email.
> 
>  I have view the patch of [0] before. For SLRU buffers adding GUC 
> configuration parameters are very nice.
>  I think for subtrans, its optimize is not enough. For 
> SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then 
> call SubTransGetParent in loop.
>  Prevent acquire/release  SubtransSLRULock in SubTransGetTopmostTransaction-> 
> SubTransGetParent in loop.
>  After I apply this patch which I  optimize SubTransGetTopmostTransaction,  
> with my test case, I still get stuck result.

SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem 
may arise only when someone reads page from disk. But if you have big enough 
cache - this will never happen. And this cache will be much less than 
512KB*max_connections.

I think if we really want to fix exclusive SubtransSLRULock I think best option 
would be to split SLRU control lock into array of locks - one for each bank (in 
v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this 
approach we will have to rename s/bank/partition/g for consistency with locks 
and buffers partitions. I really liked having my own banks, but consistency 
worth it anyway.

Thanks!

Best regards, Andrey Borodin.


Reply via email to