On Wed, Nov 26, 2025 at 5:19 PM Tomas Vondra wrote:
> Rebased patch series attached.
Thanks. BTW still with the old patchset series, One additional thing
that I've found out related to interleave is that in
CreateAnonymousSegment() with the default check_debug='', we still
issue numa_interleave_
Hi Tomas!
[..]
> Which I think is mostly the same thing you're saying, and you have the maps
> to support it.
Right, the thread is kind of long, you were right back then, well but
at least we've got a solid explanation with data.
> Here's an updated version of the patch series.
Just for double
On 11/17/25 10:23, Jakub Wartak wrote:
> On Tue, Nov 11, 2025 at 12:52 PM Tomas Vondra wrote:
>>
>> Hi,
>>
>> here's a rebased patch series, fixing most of the smaller issues from
>> v20251101, and making cfbot happy (hopefully).
>
> Hi Tomas,
>
> 0007b: pg_buffercache_pgproc -- nitpick,
On Tue, Nov 11, 2025 at 12:52 PM Tomas Vondra wrote:
>
> Hi,
>
> here's a rebased patch series, fixing most of the smaller issues from
> v20251101, and making cfbot happy (hopefully).
Hi Tomas,
> >>> 0007b: pg_buffercache_pgproc -- nitpick, but maybe it would be better
> >>> called pg_shm_pgproc
On Tue, Nov 4, 2025 at 10:21 PM Tomas Vondra wrote:
Hi Tomas,
> > 0007a: pg_buffercache_pgproc returns pgproc_ptr and fastpath_ptr in
> > bigint and not hex? I've wanted to adjust that to TEXTOID, but instead
> > I've thought it is going to be simpler to use to_hex() -- see 0009
> > attached.
>
On 11/4/25 13:10, Jakub Wartak wrote:
> On Fri, Oct 31, 2025 at 12:57 PM Tomas Vondra wrote:
>>
>> Hi,
>>
>> here's a significantly reworked version of this patch series.
>>
>> I had a couple discussions about these patches at pgconf.eu last week,[..]
>
> I've just had a quick look at this and
On Fri, Oct 31, 2025 at 12:57 PM Tomas Vondra wrote:
>
> Hi,
>
> here's a significantly reworked version of this patch series.
>
> I had a couple discussions about these patches at pgconf.eu last week,[..]
I've just had a quick look at this and oh, my, I've started getting
into this partitioned c
On 10/13/25 13:09, Tomas Vondra wrote:
> On 10/13/25 01:58, Alexey Makhmutov wrote:
>> Hi Tomas,
>>
>> Thank you very much for working on this problem and the entire line of
>> patches prepared! I've tried to play with these patches a little and
>> here are some my observations and suggestions.
On 10/13/25 14:09, Tomas Vondra wrote:
> I'm not sure I understand. Are you suggesting there's a bug in the
patch, the kernel, or somewhere else?
We need to ensure that both addr and (addr + size) are aligned to the
page size of the target mapping during 'numa_tonode_memory' invocation,
othe
Hi Tomas,
Thank you very much for working on this problem and the entire line of
patches prepared! I've tried to play with these patches a little and
here are some my observations and suggestions.
In the current implementation we try to use all available NUMA nodes on
the machine, however it
On 10/13/25 01:58, Alexey Makhmutov wrote:
> Hi Tomas,
>
> Thank you very much for working on this problem and the entire line of
> patches prepared! I've tried to play with these patches a little and
> here are some my observations and suggestions.
>
> In the current implementation we try to use
On 9/11/25 10:32, Tomas Vondra wrote:
> ...
>
> For example, we may get confused about the memory page size. The "size"
> happens before allocation, and at that point we don't know if we succeed
> in getting enough huge pages. When "init" happens, we already know that,
> so our "memory page size"
On 8/13/25 17:16, Andres Freund wrote:
> Hi,
>
> On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote:
>> The patch does a much simpler thing - treat the weight as a "budget",
>> i.e. number of buffers to allocate before proceeding to the "next"
>> partition. So it allocates 55 buffers from P1, then 4
Hi,
On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote:
> The patch does a much simpler thing - treat the weight as a "budget",
> i.e. number of buffers to allocate before proceeding to the "next"
> partition. So it allocates 55 buffers from P1, then 45 buffers from P2,
> and then goes back to P1 in
On 8/12/25 16:24, Andres Freund wrote:
> Hi,
>
> On 2025-08-12 13:04:07 +0200, Tomas Vondra wrote:
>> Right. I don't think the current patch would crash - I can't test it,
>> but I don't see why it would crash. In the worst case it'd end up with
>> partitions that are not ideal. The question is
Hi,
On 2025-08-12 13:04:07 +0200, Tomas Vondra wrote:
> Right. I don't think the current patch would crash - I can't test it,
> but I don't see why it would crash. In the worst case it'd end up with
> partitions that are not ideal. The question is more what would an ideal
> partitioning for buffer
On 8/9/25 02:25, Andres Freund wrote:
> Hi,
>
> On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote:
>> 2) I'm a bit unsure what "NUMA nodes" actually means. The patch mostly
>> assumes each core / piece of RAM is assigned to a particular NUMA node.
>
> There are systems in which some NUMA nodes
Hi,
On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote:
> 2) I'm a bit unsure what "NUMA nodes" actually means. The patch mostly
> assumes each core / piece of RAM is assigned to a particular NUMA node.
There are systems in which some NUMA nodes do *not* contain any CPUs. E.g. if
you attach memory
On 8/7/25 11:24, Tomas Vondra wrote:
> Hi!
>
> Here's a slightly improved version of the patch series.
>
Ah, I made a mistake when generating the patches. The 0001 and 0002
patches are not part of the NUMA stuff, it's just something related to
benchmarking (addressing unrelated bottlenecks etc.)
On 7/30/25 10:29, Jakub Wartak wrote:
> On Mon, Jul 28, 2025 at 4:22 PM Tomas Vondra wrote:
>
> Hi Tomas,
>
> just a quick look here:
>
>> 2) The PGPROC part introduces a similar registry, [..]
>>
>> There's also a view pg_buffercache_pgproc. The pg_buffercache location
>> is a bit bogus - it h
On Mon, Jul 28, 2025 at 4:22 PM Tomas Vondra wrote:
Hi Tomas,
just a quick look here:
> 2) The PGPROC part introduces a similar registry, [..]
>
> There's also a view pg_buffercache_pgproc. The pg_buffercache location
> is a bit bogus - it has nothing to do with buffers, but it was good
> enoug
On 7/25/25 12:27, Jakub Wartak wrote:
> On Thu, Jul 17, 2025 at 11:15 PM Tomas Vondra wrote:
>>
>> On 7/4/25 20:12, Tomas Vondra wrote:
>>> On 7/4/25 13:05, Jakub Wartak wrote:
...
8. v1-0005 2x + /* if (numa_procs_interleave) */
Ha! it's a TRAP! I've uncommented it
On Thu, Jul 17, 2025 at 11:15 PM Tomas Vondra wrote:
>
> On 7/4/25 20:12, Tomas Vondra wrote:
> > On 7/4/25 13:05, Jakub Wartak wrote:
> >> ...
> >>
> >> 8. v1-0005 2x + /* if (numa_procs_interleave) */
> >>
> >>Ha! it's a TRAP! I've uncommented it because I wanted to try it out
> >> without i
Hi,
On 2025-07-18 22:48:00 +0200, Tomas Vondra wrote:
> On 7/18/25 18:46, Andres Freund wrote:
> >> For a read-write pgbench I however saw some strange drops/increases of
> >> throughput. I suspect this might be due to some thinko in the clocksweep
> >> partitioning, but I'll need to take a closer
On 7/18/25 18:46, Andres Freund wrote:
> Hi,
>
> On 2025-07-17 23:11:16 +0200, Tomas Vondra wrote:
>> Here's a v2 of the patch series, with a couple changes:
>
> Not a deep look at the code, just a quick reply.
>
>
>> * I changed the freelist partitioning scheme a little bit, based on the
>> di
Hi,
On 2025-07-17 23:11:16 +0200, Tomas Vondra wrote:
> Here's a v2 of the patch series, with a couple changes:
Not a deep look at the code, just a quick reply.
> * I changed the freelist partitioning scheme a little bit, based on the
> discussion in this thread. Instead of having a single "par
On 7/4/25 20:12, Tomas Vondra wrote:
> On 7/4/25 13:05, Jakub Wartak wrote:
>> ...
>>
>> 8. v1-0005 2x + /* if (numa_procs_interleave) */
>>
>>Ha! it's a TRAP! I've uncommented it because I wanted to try it out
>> without it (just by setting GUC off) , but "MyProc->sema" is NULL :
>>
>> 202
> On Jul 10, 2025, at 8:13 AM, Burd, Greg wrote:
>
>
>> On Jul 9, 2025, at 1:23 PM, Andres Freund wrote:
>>
>> Hi,
>>
>> On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
>>> On Jul 9 2025, at 12:35 pm, Andres Freund wrote:
>>>
FWIW, I've started to wonder if we shouldn't just get rid
Hi,
On 2025-07-10 14:17:21 +, Bertrand Drouvot wrote:
> On Wed, Jul 09, 2025 at 03:42:26PM -0400, Andres Freund wrote:
> > I wonder if we should *increase* the size of shared_buffers whenever huge
> > pages are in use and there's padding space due to the huge page
> > boundaries. Pretty pointl
Hi,
On 2025-07-10 17:31:45 +0200, Tomas Vondra wrote:
> On 7/9/25 19:23, Andres Freund wrote:
> > There's other things around this that could use some attention. It's not
> > hard
> > to see clock sweep be a bottleneck in concurrent workloads - partially due
> > to
> > the shared maintenance of
On 7/9/25 19:23, Andres Freund wrote:
> Hi,
>
> On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
>> On Jul 9 2025, at 12:35 pm, Andres Freund wrote:
>>
>>> FWIW, I've started to wonder if we shouldn't just get rid of the freelist
>>> entirely. While clocksweep is perhaps minutely slower in a sin
On 7/9/25 08:40, Cédric Villemain wrote:
>> On 7/8/25 18:06, Cédric Villemain wrote:
>>>
>>>
>>>
>>>
>>>
>>>
On 7/8/25 03:55, Cédric Villemain wrote:
> Hi Andres,
>
>> Hi,
>>
>> On 2025-07-05 07:09:00 +, Cédric Villemain wrote:
>>> In my work on more careful Post
Hi,
On Wed, Jul 09, 2025 at 03:42:26PM -0400, Andres Freund wrote:
> Hi,
>
> Thanks for working on this!
Indeed, thanks!
> On 2025-07-01 21:07:00 +0200, Tomas Vondra wrote:
> > 1) v1-0001-NUMA-interleaving-buffers.patch
> >
> > This is the main thing when people think about NUMA - making sure t
> On Jul 9, 2025, at 1:23 PM, Andres Freund wrote:
>
> Hi,
>
> On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
>> On Jul 9 2025, at 12:35 pm, Andres Freund wrote:
>>
>>> FWIW, I've started to wonder if we shouldn't just get rid of the freelist
>>> entirely. While clocksweep is perhaps minute
On Wed, Jul 9, 2025 at 9:42 PM Andres Freund wrote:
> On 2025-07-01 21:07:00 +0200, Tomas Vondra wrote:
> > Each patch has a numa_ GUC, intended to enable/disable that part. This
> > is meant to make development easier, not as a final interface. I'm not
> > sure how exactly that should look. It's
On Wed, Jul 9, 2025 at 7:13 PM Andres Freund wrote:
> > Yes, and we are discussing if it is worth getting into smaller pages
> > for such usecases (e.g. 4kB ones without hugetlb with 2MB hugepages or
> > what more even more waste 1GB hugetlb if we dont request 2MB for some
> > small structs: btw,
Hi,
Thanks for working on this! I think it's an area we have long neglected...
On 2025-07-01 21:07:00 +0200, Tomas Vondra wrote:
> Each patch has a numa_ GUC, intended to enable/disable that part. This
> is meant to make development easier, not as a final interface. I'm not
> sure how exactly t
Hi,
On 2025-07-08 16:06:00 +, Cédric Villemain wrote:
> > Assuming we want to actually pin tasks from within Postgres, what I
> > think might work is allowing modules to "advise" on where to place the
> > task. But the decision would still be done by core.
>
> Possibly exactly what you're doi
Hi,
On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
> On Jul 9 2025, at 12:35 pm, Andres Freund wrote:
>
> > FWIW, I've started to wonder if we shouldn't just get rid of the freelist
> > entirely. While clocksweep is perhaps minutely slower in a single
> > thread than
> > the freelist, clock sweep
Hi,
On 2025-07-09 12:04:00 +0200, Jakub Wartak wrote:
> On Tue, Jul 8, 2025 at 2:56 PM Andres Freund wrote:
> > On 2025-07-08 14:27:12 +0200, Tomas Vondra wrote:
> > > On 7/8/25 05:04, Andres Freund wrote:
> > > > On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote:
> > > > The reason it would be ad
On Jul 9 2025, at 12:35 pm, Andres Freund wrote:
> FWIW, I've started to wonder if we shouldn't just get rid of the freelist
> entirely. While clocksweep is perhaps minutely slower in a single
> thread than
> the freelist, clock sweep scales *considerably* better [1]. As it's rather
> rare to
Hi,
On 2025-07-02 14:36:31 +0200, Tomas Vondra wrote:
> On 7/2/25 13:37, Ashutosh Bapat wrote:
> > On Wed, Jul 2, 2025 at 12:37 AM Tomas Vondra wrote:
> >>
> >>
> >> 3) v1-0003-freelist-Don-t-track-tail-of-a-freelist.patch
> >>
> >> Minor optimization. Andres noticed we're tracking the tail of bu
On Tue, Jul 8, 2025 at 2:56 PM Andres Freund wrote:
>
> Hi,
>
> On 2025-07-08 14:27:12 +0200, Tomas Vondra wrote:
> > On 7/8/25 05:04, Andres Freund wrote:
> > > On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote:
> > > The reason it would be advantageous to put something like the procarray
> > > o
Hi,
On Wed, Jul 09, 2025 at 06:40:00AM +, Cédric Villemain wrote:
> > On 7/8/25 18:06, Cédric Villemain wrote:
> > I'm not against making this extensible, in some way. But I still
> > struggle to imagine a reasonable alternative policy, where the external
> > module gets the same information a
On 7/8/25 18:06, Cédric Villemain wrote:
On 7/8/25 03:55, Cédric Villemain wrote:
Hi Andres,
Hi,
On 2025-07-05 07:09:00 +, Cédric Villemain wrote:
In my work on more careful PostgreSQL resource management, I've come
to the
conclusion that we should avoid pushing policy too deeply
On 7/8/25 18:06, Cédric Villemain wrote:
>
>
>
>
>
>
>> On 7/8/25 03:55, Cédric Villemain wrote:
>>> Hi Andres,
>>>
Hi,
On 2025-07-05 07:09:00 +, Cédric Villemain wrote:
> In my work on more careful PostgreSQL resource management, I've come
> to the
> conclus
On 7/8/25 03:55, Cédric Villemain wrote:
Hi Andres,
Hi,
On 2025-07-05 07:09:00 +, Cédric Villemain wrote:
In my work on more careful PostgreSQL resource management, I've come
to the
conclusion that we should avoid pushing policy too deeply into the
PostgreSQL core itself. Therefor
Hi,
On 2025-07-08 14:27:12 +0200, Tomas Vondra wrote:
> On 7/8/25 05:04, Andres Freund wrote:
> > On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote:
> > The reason it would be advantageous to put something like the procarray onto
> > smaller pages is that otherwise the entire procarray (unless part
On 7/8/25 03:55, Cédric Villemain wrote:
> Hi Andres,
>
>> Hi,
>>
>> On 2025-07-05 07:09:00 +, Cédric Villemain wrote:
>>> In my work on more careful PostgreSQL resource management, I've come
>>> to the
>>> conclusion that we should avoid pushing policy too deeply into the
>>> PostgreSQL core
On 7/8/25 05:04, Andres Freund wrote:
> Hi,
>
> On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote:
>> On Tue, Jul 1, 2025 at 9:07 PM Tomas Vondra wrote:
>>> I don't think the splitting would actually make some things simpler, or
>>> maybe more flexible - in particular, it'd allow us to enable huge
Hi,
On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote:
> On Tue, Jul 1, 2025 at 9:07 PM Tomas Vondra wrote:
> > I don't think the splitting would actually make some things simpler, or
> > maybe more flexible - in particular, it'd allow us to enable huge pages
> > only for some regions (like shared
On 7/7/25 16:51, Cédric Villemain wrote:
* Others might use it to integrate PostgreSQL's own resources (e.g.,
"areas" of shared buffers) into policies.
Hope this perspective is helpful.
Can you explain how you want to manage this by an extension defined at
the SQL level, when most of t
Hi Andres,
Hi,
On 2025-07-05 07:09:00 +, Cédric Villemain wrote:
In my work on more careful PostgreSQL resource management, I've come to the
conclusion that we should avoid pushing policy too deeply into the
PostgreSQL core itself. Therefore, I'm quite skeptical about integrating
NUMA-spec
On 7/7/25 16:51, Cédric Villemain wrote:
* Others might use it to integrate PostgreSQL's own resources (e.g.,
"areas" of shared buffers) into policies.
Hope this perspective is helpful.
Can you explain how you want to manage this by an extension defined at
the SQL level, when most of this stuf
Hi,
On 2025-07-05 07:09:00 +, Cédric Villemain wrote:
> In my work on more careful PostgreSQL resource management, I've come to the
> conclusion that we should avoid pushing policy too deeply into the
> PostgreSQL core itself. Therefore, I'm quite skeptical about integrating
> NUMA-specific ma
On 7/7/25 16:51, Cédric Villemain wrote:
>>> * Others might use it to integrate PostgreSQL's own resources (e.g.,
>>> "areas" of shared buffers) into policies.
>>>
>>> Hope this perspective is helpful.
>>
>> Can you explain how you want to manage this by an extension defined at
>> the SQL level, wh
* Others might use it to integrate PostgreSQL's own resources (e.g.,
"areas" of shared buffers) into policies.
Hope this perspective is helpful.
Can you explain how you want to manage this by an extension defined at
the SQL level, when most of this stuff has to be done when setting up
shared me
Hi Tomas, some more thoughts after the weekend:
On Fri, Jul 4, 2025 at 8:12 PM Tomas Vondra wrote:
>
> On 7/4/25 13:05, Jakub Wartak wrote:
> > On Tue, Jul 1, 2025 at 9:07 PM Tomas Vondra wrote:
> >
> > Hi!
> >
> >> 1) v1-0001-NUMA-interleaving-buffers.patch
> > [..]
> >> It's a bit more complic
On 7/5/25 09:09, Cédric Villemain wrote:
> Hi Tomas,
>
>
> I haven't yet had time to fully read all the work and proposals around
> NUMA and related features, but I hope to catch up over the summer.
>
> However, I think it's important to share some thoughts before it's too
> late, as you migh
Hi Tomas,
I haven't yet had time to fully read all the work and proposals around
NUMA and related features, but I hope to catch up over the summer.
However, I think it's important to share some thoughts before it's too
late, as you might find them relevant to the NUMA management code.
6)
On 7/4/25 13:05, Jakub Wartak wrote:
> On Tue, Jul 1, 2025 at 9:07 PM Tomas Vondra wrote:
>
> Hi!
>
>> 1) v1-0001-NUMA-interleaving-buffers.patch
> [..]
>> It's a bit more complicated, because the patch distributes both the
>> blocks and descriptors, in the same way. So a buffer and it's descrip
On Tue, Jul 1, 2025 at 9:07 PM Tomas Vondra wrote:
Hi!
> 1) v1-0001-NUMA-interleaving-buffers.patch
[..]
> It's a bit more complicated, because the patch distributes both the
> blocks and descriptors, in the same way. So a buffer and it's descriptor
> always end on the same NUMA node. This is on
> On Wed, Jul 02, 2025 at 05:07:28PM +0530, Ashutosh Bapat wrote:
> > There's also the question how this is related to other patches affecting
> > shared memory - I think the most relevant one is the "shared buffers
> > online resize" by Ashutosh, simply because it touches the shared memory.
>
> I
On Wed, Jul 2, 2025 at 6:06 PM Tomas Vondra wrote:
>
> I'm not sure how you're rebuilding the freelist. Presumably it can
> contain buffers that are no longer valid (after shrinking). How is that
> handled to not break anything? I think the NUMA variant would do exactly
> the same thing, except th
On 7/2/25 13:37, Ashutosh Bapat wrote:
> On Wed, Jul 2, 2025 at 12:37 AM Tomas Vondra wrote:
>>
>>
>> 3) v1-0003-freelist-Don-t-track-tail-of-a-freelist.patch
>>
>> Minor optimization. Andres noticed we're tracking the tail of buffer
>> freelist, without using it. So the patch removes that.
>>
On Wed, Jul 2, 2025 at 12:37 AM Tomas Vondra wrote:
>
>
> 3) v1-0003-freelist-Don-t-track-tail-of-a-freelist.patch
>
> Minor optimization. Andres noticed we're tracking the tail of buffer
> freelist, without using it. So the patch removes that.
>
The patches for resizing buffers use the lastFreeB
Hi,
This is a WIP version of a patch series I'm working on, adding some
basic NUMA awareness for a couple parts of our shared memory (shared
buffers, etc.). It's based on Andres' experimental patches he spoke
about at pgconf.eu 2024 [1], and while it's improved and polished in
various ways, it's s
67 matches
Mail list logo