Hi,

On 6/7/23 23:37, Andres Freund wrote:
I think we're starting to hit quite a few limits related to the process model,
particularly on bigger machines. The overhead of cross-process context
switches is inherently higher than switching between threads in the same
process - and my suspicion is that that overhead will continue to
increase. Once you have a significant number of connections we end up spending
a *lot* of time in TLB misses, and that's inherent to the process model,
because you can't share the TLB across processes.

Another problem I haven't seen mentioned yet is the excessive kernel memory usage because every process has its own set of page table entries (PTEs). Without huge pages the amount of wasted memory can be huge if shared buffers are big.

For example with 256 GiB of used shared buffers a single process needs about 256 MiB for the PTEs (for simplicity I ignored the tree structure of the page tables and just took the number of 4k pages times 4 bytes per PTE). With 512 connections, which is not uncommon for machines with many cores, a total of 128 GiB of memory is just spent on page tables.

We used non-transparent huge pages to work around this limitation but they come with plenty of provisioning challenges, especially in cloud infrastructures where different services run next to each other on the same server. Transparent huge pages have unpredictable performance disadvantages. Also if some backends only use shared buffers sparsely, memory is wasted for the remaining, unused range inside the huge page.

--
David Geier
(ServiceNow)



Reply via email to