On Fri, Oct 1, 2021 at 3:26 PM Yura Sokolov <y.soko...@postgrespro.ru>
wrote:

> Good day.
>
> I found some opportunity in Buffer Manager code in BufferAlloc
> function:
> - When valid buffer is evicted, BufferAlloc acquires two partition
> lwlocks: for partition for evicted block is in and partition for new
> block placement.
>
> It doesn't matter if there is small number of concurrent replacements.
> But if there are a lot of concurrent backends replacing buffers,
> complex dependency net quickly arose.
>
> It could be easily seen with select-only pgbench with scale 100 and
> shared buffers 128MB: scale 100 produces 1.5GB tables, and it certainly
> doesn't fit shared buffers. This way performance starts to degrade at
> ~100 connections. Even with shared buffers 1GB it slowly degrades after
> 150 connections.
>
> But strictly speaking, there is no need to hold both lock
> simultaneously. Buffer is pinned so other processes could not select it
> for eviction. If tag is cleared and buffer removed from old partition
> then other processes will not find it. Therefore it is safe to release
> old partition lock before acquiring new partition lock.
>
> If other process concurrently inserts same new buffer, then old buffer
> is placed to bufmanager's freelist.
>
> Additional optimisation: in case of old buffer is reused, there is no
> need to put its BufferLookupEnt into dynahash's freelist. That reduces
> lock contention a bit more. To acomplish this FreeListData.nentries is
> changed to pg_atomic_u32/pg_atomic_u64 and atomic increment/decrement
> is used.
>
> Remark: there were bug in the `hash_update_hash_key`: nentries were not
> kept in sync if freelist partitions differ. This bug were never
> triggered because single use of `hash_update_hash_key` doesn't move
> entry between partitions.
>
> There is some tests results.
>
> - pgbench with scale 100 were tested with --select-only (since we want
> to test buffer manager alone). It produces 1.5GB table.
> - two shared_buffers values were tested: 128MB and 1GB.
> - second best result were taken among five runs
>
> Test were made in three system configurations:
> - notebook with i7-1165G7 (limited to 2.8GHz to not overheat)
> - Xeon X5675 6 core 2 socket NUMA system (12 cores/24 threads).
> - same Xeon X5675 but restricted to single socket
>   (with numactl -m 0 -N 0)
>
> Results for i7-1165G7:
>
>   conns |     master |    patched |  master 1G | patched 1G
> --------+------------+------------+------------+------------
>       1 |      29667 |      29079 |      29425 |      29411
>       2 |      55577 |      55553 |      57974 |      57223
>       3 |      87393 |      87924 |      87246 |      89210
>       5 |     136222 |     136879 |     133775 |     133949
>       7 |     179865 |     176734 |     178297 |     175559
>      17 |     215953 |     214708 |     222908 |     223651
>      27 |     211162 |     213014 |     220506 |     219752
>      53 |     211620 |     218702 |     220906 |     225218
>      83 |     213488 |     221799 |     219075 |     228096
>     107 |     212018 |     222110 |     222502 |     227825
>     139 |     207068 |     220812 |     218191 |     226712
>     163 |     203716 |     220793 |     213498 |     226493
>     191 |     199248 |     217486 |     210994 |     221026
>     211 |     195887 |     217356 |     209601 |     219397
>     239 |     193133 |     215695 |     209023 |     218773
>     271 |     190686 |     213668 |     207181 |     219137
>     307 |     188066 |     214120 |     205392 |     218782
>     353 |     185449 |     213570 |     202120 |     217786
>     397 |     182173 |     212168 |     201285 |     216489
>
> Results for 1 socket X5675
>
>   conns |     master |    patched |  master 1G | patched 1G
> --------+------------+------------+------------+------------
>       1 |      16864 |      16584 |      17419 |      17630
>       2 |      32764 |      32735 |      34593 |      34000
>       3 |      47258 |      46022 |      49570 |      47432
>       5 |      64487 |      64929 |      68369 |      68885
>       7 |      81932 |      82034 |      87543 |      87538
>      17 |     114502 |     114218 |     127347 |     127448
>      27 |     116030 |     115758 |     130003 |     128890
>      53 |     116814 |     117197 |     131142 |     131080
>      83 |     114438 |     116704 |     130198 |     130985
>     107 |     113255 |     116910 |     129932 |     131468
>     139 |     111577 |     116929 |     129012 |     131782
>     163 |     110477 |     116818 |     128628 |     131697
>     191 |     109237 |     116672 |     127833 |     131586
>     211 |     108248 |     116396 |     127474 |     131650
>     239 |     107443 |     116237 |     126731 |     131760
>     271 |     106434 |     115813 |     126009 |     131526
>     307 |     105077 |     115542 |     125279 |     131421
>     353 |     104516 |     115277 |     124491 |     131276
>     397 |     103016 |     114842 |     123624 |     131019
>
> Results for 2 socket x5675
>
>   conns |     master |    patched |  master 1G | patched 1G
> --------+------------+------------+------------+------------
>       1 |      16323 |      16280 |      16959 |      17598
>       2 |      30510 |      31431 |      33763 |      31690
>       3 |      45051 |      45834 |      48896 |      47991
>       5 |      71800 |      73208 |      78077 |      74714
>       7 |      89792 |      89980 |      95986 |      96662
>      17 |     178319 |     177979 |     195566 |     196143
>      27 |     210475 |     205209 |     226966 |     235249
>      53 |     222857 |     220256 |     252673 |     251041
>      83 |     219652 |     219938 |     250309 |     250464
>     107 |     218468 |     219849 |     251312 |     251425
>     139 |     210486 |     217003 |     250029 |     250695
>     163 |     204068 |     218424 |     248234 |     252940
>     191 |     200014 |     218224 |     246622 |     253331
>     211 |     197608 |     218033 |     245331 |     253055
>     239 |     195036 |     218398 |     243306 |     253394
>     271 |     192780 |     217747 |     241406 |     253148
>     307 |     189490 |     217607 |     239246 |     253373
>     353 |     186104 |     216697 |     236952 |     253034
>     397 |     183507 |     216324 |     234764 |     252872
>
> As can be seen, patched version degrades much slower than master.
> (Or even doesn't degrade with 1G shared buffer on older processor).
>
> PS.
>
> There is a room for further improvements:
> - buffer manager's freelist could be partitioned
> - dynahash's freelist could be sized/aligned to CPU cache line
> - in fact, there is no need in dynahash at all. It is better to make
>   custom hash-table using BufferDesc as entries. BufferDesc has spare
>   space for link to next and hashvalue.
>
> regards,
> Yura Sokolov
> y.soko...@postgrespro.ru
> funny.fal...@gmail.com


Hi,
Improvement is impressive.

For BufTableFreeDeleted(), since it only has one call, maybe its caller can
invoke hash_return_to_freelist() directly.

For free_list_decrement_nentries():

+   Assert(hctl->freeList[freelist_idx].nentries.value < MAX_NENTRIES);

Is the assertion necessary ? There is similar assertion in
free_list_increment_nentries() which would
maintain hctl->freeList[freelist_idx].nentries.value <= MAX_NENTRIES.

Cheers

Reply via email to