Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-03-24 Thread patchwork-bot+linux-riscv
Hello: This series was applied to riscv/linux.git (fixes) by Andrew Morton : On Mon, 29 Jan 2024 13:46:34 +0100 you wrote: > Now that the rmap overhaul[1] is upstream that provides a clean interface > for rmap batching, let's implement PTE batching during fork when processing > PTE-mapped THPs. >

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
dontneed should hopefully/likely see a speedup. Yes, but that's almost exactly the same path as munmap, so I'm sure it really adds much for this particular series. Right, that's why I'm not including these measurements. dontneed vs. munmap is more about measuring the overhead of VMA modifica

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 15:05, David Hildenbrand wrote: > On 31.01.24 16:02, Ryan Roberts wrote: >> On 31/01/2024 14:29, David Hildenbrand wrote: > Note that regarding NUMA effects, I mean when some memory access within > the > same > socket is faster/slower even with only a single node. On

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 16:02, Ryan Roberts wrote: On 31/01/2024 14:29, David Hildenbrand wrote: Note that regarding NUMA effects, I mean when some memory access within the same socket is faster/slower even with only a single node. On AMD EPYC that's possible, depending on which core you are running and on

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 14:29, David Hildenbrand wrote: >>> Note that regarding NUMA effects, I mean when some memory access within the >>> same >>> socket is faster/slower even with only a single node. On AMD EPYC that's >>> possible, depending on which core you are running and on which memory >>> control

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
Note that regarding NUMA effects, I mean when some memory access within the same socket is faster/slower even with only a single node. On AMD EPYC that's possible, depending on which core you are running and on which memory controller the memory you want to access is located. If both are in differ

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 13:38, David Hildenbrand wrote: Nope: looks the same. I've taken my test harness out of the picture and done everything manually from the ground up, with the old tests and the new. Headline is that I see similar numbers from both. >>> >>> I took me a while to

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
Nope: looks the same. I've taken my test harness out of the picture and done everything manually from the ground up, with the old tests and the new. Headline is that I see similar numbers from both. I took me a while to get really reproducible numbers on Intel. Most importantly: * Set a fixed CP

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 12:56, David Hildenbrand wrote: > On 31.01.24 13:37, Ryan Roberts wrote: >> On 31/01/2024 11:49, Ryan Roberts wrote: >>> On 31/01/2024 11:28, David Hildenbrand wrote: On 31.01.24 12:16, Ryan Roberts wrote: > On 31/01/2024 11:06, David Hildenbrand wrote: >> On 31.01.24 11:

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
I'm also surprised about the dontneed vs. munmap numbers. You mean the ones for Altra that I posted? (I didn't post any for M2). The altra numbers look ok to me; dontneed has no change, and munmap has no change for order-0 and is massively improved for order-9. I would expect that dontneed w

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 13:37, Ryan Roberts wrote: On 31/01/2024 11:49, Ryan Roberts wrote: On 31/01/2024 11:28, David Hildenbrand wrote: On 31.01.24 12:16, Ryan Roberts wrote: On 31/01/2024 11:06, David Hildenbrand wrote: On 31.01.24 11:43, Ryan Roberts wrote: On 29/01/2024 12:46, David Hildenbrand wro

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 11:49, Ryan Roberts wrote: > On 31/01/2024 11:28, David Hildenbrand wrote: >> On 31.01.24 12:16, Ryan Roberts wrote: >>> On 31/01/2024 11:06, David Hildenbrand wrote: On 31.01.24 11:43, Ryan Roberts wrote: > On 29/01/2024 12:46, David Hildenbrand wrote: >> Now that the rm

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 11:28, David Hildenbrand wrote: > On 31.01.24 12:16, Ryan Roberts wrote: >> On 31/01/2024 11:06, David Hildenbrand wrote: >>> On 31.01.24 11:43, Ryan Roberts wrote: On 29/01/2024 12:46, David Hildenbrand wrote: > Now that the rmap overhaul[1] is upstream that provides a clean

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 12:16, Ryan Roberts wrote: On 31/01/2024 11:06, David Hildenbrand wrote: On 31.01.24 11:43, Ryan Roberts wrote: On 29/01/2024 12:46, David Hildenbrand wrote: Now that the rmap overhaul[1] is upstream that provides a clean interface for rmap batching, let's implement PTE batching du

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 11:06, David Hildenbrand wrote: > On 31.01.24 11:43, Ryan Roberts wrote: >> On 29/01/2024 12:46, David Hildenbrand wrote: >>> Now that the rmap overhaul[1] is upstream that provides a clean interface >>> for rmap batching, let's implement PTE batching during fork when processing >>> P

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 11:43, Ryan Roberts wrote: On 29/01/2024 12:46, David Hildenbrand wrote: Now that the rmap overhaul[1] is upstream that provides a clean interface for rmap batching, let's implement PTE batching during fork when processing PTE-mapped THPs. This series is partially based on Ryan's pr

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 29/01/2024 12:46, David Hildenbrand wrote: > Now that the rmap overhaul[1] is upstream that provides a clean interface > for rmap batching, let's implement PTE batching during fork when processing > PTE-mapped THPs. > > This series is partially based on Ryan's previous work[2] to implement > co

[PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-29 Thread David Hildenbrand
Now that the rmap overhaul[1] is upstream that provides a clean interface for rmap batching, let's implement PTE batching during fork when processing PTE-mapped THPs. This series is partially based on Ryan's previous work[2] to implement cont-pte support on arm64, but its a complete rewrite based