On 2026/3/11 16:33, Natalie Vock wrote:
> On 3/11/26 02:12, Chen Ridong wrote:
>>
>>
>> On 2026/3/2 20:37, Natalie Vock wrote:
>>> Callers can use this feedback to be more aggressive in making space for
>>> allocations of a cgroup if they know it is protected.
>>>
>>> These are counterparts to memcg's mem_cgroup_below_{min,low}.
>>>
>>> Signed-off-by: Natalie Vock <[email protected]>
>>> ---
>>>   include/linux/cgroup_dmem.h | 16 ++++++++++++
>>>   kernel/cgroup/dmem.c        | 62 
>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 78 insertions(+)
>>>
>>> diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
>>> index dd4869f1d736e..1a88cd0c9eb00 100644
>>> --- a/include/linux/cgroup_dmem.h
>>> +++ b/include/linux/cgroup_dmem.h
>>> @@ -24,6 +24,10 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state
>>> *pool, u64 size);
>>>   bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state
>>> *limit_pool,
>>>                         struct dmem_cgroup_pool_state *test_pool,
>>>                         bool ignore_low, bool *ret_hit_low);
>>> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>>> +               struct dmem_cgroup_pool_state *test);
>>> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>>> +               struct dmem_cgroup_pool_state *test);
>>>     void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
>>>   #else
>>> @@ -59,6 +63,18 @@ bool dmem_cgroup_state_evict_valuable(struct
>>> dmem_cgroup_pool_state *limit_pool,
>>>       return true;
>>>   }
>>>   +static inline bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state 
>>> *root,
>>> +                     struct dmem_cgroup_pool_state *test)
>>> +{
>>> +    return false;
>>> +}
>>> +
>>> +static inline bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state 
>>> *root,
>>> +                     struct dmem_cgroup_pool_state *test)
>>> +{
>>> +    return false;
>>> +}
>>> +
>>>   static inline void dmem_cgroup_pool_state_put(struct 
>>> dmem_cgroup_pool_state
>>> *pool)
>>>   { }
>>>   diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
>>> index 9d95824dc6fa0..28227405f7cfe 100644
>>> --- a/kernel/cgroup/dmem.c
>>> +++ b/kernel/cgroup/dmem.c
>>> @@ -694,6 +694,68 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region
>>> *region, u64 size,
>>>   }
>>>   EXPORT_SYMBOL_GPL(dmem_cgroup_try_charge);
>>>   +/**
>>> + * dmem_cgroup_below_min() - Tests whether current usage is within min 
>>> limit.
>>> + *
>>> + * @root: Root of the subtree to calculate protection for, or NULL to
>>> calculate global protection.
>>> + * @test: The pool to test the usage/min limit of.
>>> + *
>>> + * Return: true if usage is below min and the cgroup is protected, false
>>> otherwise.
>>> + */
>>> +bool dmem_cgroup_below_min(struct dmem_cgroup_pool_state *root,
>>> +               struct dmem_cgroup_pool_state *test)
>>> +{
>>> +    if (root == test || !pool_parent(test))
>>> +        return false;
>>> +
>>> +    if (!root) {
>>> +        for (root = test; pool_parent(root); root = pool_parent(root))
>>> +            {}
>>> +    }
>>
>> It seems we don't have find the global protection(root), since the root's
>> protection can not be set. If !root, we can return false directly, right?
>>
>> Or do I miss anything?
>>
>> ```
>>     {
>>         .name = "min",
>>         .write = dmem_cgroup_region_min_write,
>>         .seq_show = dmem_cgroup_region_min_show,
>>         .flags = CFTYPE_NOT_ON_ROOT,
>>     },
>>     {
>>         .name = "low",
>>         .write = dmem_cgroup_region_low_write,
>>         .seq_show = dmem_cgroup_region_low_show,
>>         .flags = CFTYPE_NOT_ON_ROOT,
>>     },
>> ```
> 
> That's not quite how it works. You're correct that the min/low properties 
> don't
> exist on the root cgroup, but we don't use the root for that.
> 
> The reason we have a root here in the first place has to do with how recursive
> memory protection works in cgroups. Note that for the test cgroup, we don't 
> read
> the literal min/low protection setting, but the "emin"/"elow" value, referring
> to effective protection. The effective protection value depends not just on 
> the
> settings of the "test" cgroup, but also its ancestors (and potentially, their
> sibling groups). See [1] for some documentation on how effective protection
> varies with different cgroup relationships.
> 
> The "root" parameter here refers to the root of the common subtree between the
> test cgroup and what the documentation refers to as the "reclaim target". For
> device memory there usually isn't really any reclaim happening in the
> traditional sense, but e.g. TTM evictions follow the same principle (the 
> reclaim
> target is simply the cgroup owning the buffer that is to be evicted).
> 
> Sometimes, precise reclaim targets may not really be known yet (or we want to
> try evicting different buffers originating from different cgroups). In that
> case, the "root" parameter here is NULL. However, we obviously know that all
> cgroups must be descendants of the root cgroup, so the root cgroup is a
> guaranteed safe value for the shared subtree between the test cgroup and any
> potential reclaim target.
> 
> In practice, this means that the effective min/low protection will be capped 
> by
> the protection value specified in all ancestors, which is the most 
> conservative
> estimate.
> 
> Regards,
> Natalie
> 
> [1] https://docs.kernel.org/admin-guide/cgroup-v2.html#reclaim-protection
> 

Thank you for your explanation. I made a mistake.
Sorry for the noisy.

>>
>>> +
>>> +    /*
>>> +     * In mem_cgroup_below_min(), the memcg pendant, this call is missing.
>>> +     * mem_cgroup_below_min() gets called during traversal of the cgroup
>>> tree, where
>>> +     * protection is already calculated as part of the traversal. dmem
>>> cgroup eviction
>>> +     * does not traverse the cgroup tree, so we need to recalculate
>>> effective protection
>>> +     * here.
>>> +     */
>>> +    dmem_cgroup_calculate_protection(root, test);
>>> +    return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.emin);
>>> +}
>>> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_min);
>>> +
>>> +/**
>>> + * dmem_cgroup_below_low() - Tests whether current usage is within low 
>>> limit.
>>> + *
>>> + * @root: Root of the subtree to calculate protection for, or NULL to
>>> calculate global protection.
>>> + * @test: The pool to test the usage/low limit of.
>>> + *
>>> + * Return: true if usage is below low and the cgroup is protected, false
>>> otherwise.
>>> + */
>>> +bool dmem_cgroup_below_low(struct dmem_cgroup_pool_state *root,
>>> +               struct dmem_cgroup_pool_state *test)
>>> +{
>>> +    if (root == test || !pool_parent(test))
>>> +        return false;
>>> +
>>> +    if (!root) {
>>> +        for (root = test; pool_parent(root); root = pool_parent(root))
>>> +            {}
>>> +    }
>>> +
>>> +    /*
>>> +     * In mem_cgroup_below_low(), the memcg pendant, this call is missing.
>>> +     * mem_cgroup_below_low() gets called during traversal of the cgroup
>>> tree, where
>>> +     * protection is already calculated as part of the traversal. dmem
>>> cgroup eviction
>>> +     * does not traverse the cgroup tree, so we need to recalculate
>>> effective protection
>>> +     * here.
>>> +     */
>>> +    dmem_cgroup_calculate_protection(root, test);
>>> +    return page_counter_read(&test->cnt) <= READ_ONCE(test->cnt.elow);
>>> +}
>>> +EXPORT_SYMBOL_GPL(dmem_cgroup_below_low);
>>> +
>>>   static int dmem_cgroup_region_capacity_show(struct seq_file *sf, void *v)
>>>   {
>>>       struct dmem_cgroup_region *region;
>>>
>>

-- 
Best regards,
Ridong

Reply via email to