are directly allocated from one part of the DIMM (presumably the
start) and use that for the metadata -- potentially all the metadata that
would be necessary to plug/unplug the entire DIMM. This would effectively
be unmovable but if you want to guarantee that all the memory except the
metadata can be unplugged, you do not have much alteratives. Playing games
with the ordering of the freelists will simply end up as "sometimes works,
sometimes does not".
In terms of forcing ranges to be UNMOVABLE or MOVABLE (either via zones
or by implementing "sticky" pageblocks which hits complex reclaim-related
problems), you start running into problems similar to lowmem starvation
where a page cache allocation fails because unmovable metadata cannot
be allocated.
I suggest you keep it simple -- statically allocate the potential
metadata needed in the future even though it limits the maximum amount
of memory that can be unplugged. The alternative is unpredictable
plug/unplug success rates.
--
Mel Gorman
SUSE Labs
if we really care.
>
> Honestly, I have no idea. I can imagine that some atomic high order
> workloads (e.g. in net) might benefit from cache line hot pages but I am
> not sure this is really observable.
The highatomic reserve is more concerned that about the allocation
succeeding than it is about cache hotness.
--
Mel Gorman
SUSE Labs
because keeping !managed_zones out of the
zonelist and reclaim paths and the behaviour makes sense. Elsewhere you
state "zone can always happen to have no free memory left" and this is true
but it's usually a transient event. The difference between a populated
vs managed zone is usually permanent event where no memory will ever be
placed on the buddy lists because the memory was reserved early in boot
or a similar reason. The patch is probably harmless but it has the
potential to waste CPUs allocating or reclaiming from zones that will
never succeed.
--
Mel Gorman
SUSE Labs
rily set at 256 per second).
>
> Signed-off-by: Kent Overstreet
Why not use percpu_counter? It has a per-cpu counter that is synchronised
when a batch threshold (default 32) is exceeded and can explicitly sync
the counters when required assuming the synchronised count is only needed
when read
_t gfp, unsigned order)
> {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e5486d47406e..165daba19e2a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -763,6 +763,7 @@ static inline bool pcp_allowed_order(unsigned int order)
>
> static inline void free_the_page(struct page *page, unsigned int order)
> {
> +
> if (pcp_allowed_order(order)) /* Via pcp? */
> free_unref_page(page, order);
> else
Spurious wide-space change.
--
Mel Gorman
SUSE Labs
g the information from traces
is tricky and would need combining multiple different elements and that
is development effort but not impossible.
Whatever asking for an explanation as to why equivalent functionality
cannot not be created from ftrace/kprobe/eBPF/whatever is reasonable.
--
Mel Gorman
SUSE Labs
On Wed, Aug 31, 2022 at 11:59:41AM -0400, Kent Overstreet wrote:
> On Wed, Aug 31, 2022 at 11:19:48AM +0100, Mel Gorman wrote:
> > On Wed, Aug 31, 2022 at 04:42:30AM -0400, Kent Overstreet wrote:
> > > On Wed, Aug 31, 2022 at 09:38:27AM +0200, Peter Zijlstra wrote:
> > >