x27;t need to be allocated.
>
> Link:
> https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/
>
> Signed-off-by: Steven Rostedt (Google)
Patch looks good to me.
Reviewd-by: Tim Chen
> ---
> Changes since v1:
> https://lore.kernel.org
On Mon, 2024-02-12 at 19:13 -0500, Steven Rostedt wrote:
> On Mon, 12 Feb 2024 15:39:03 -0800
> Tim Chen wrote:
>
> > > diff --git a/kernel/trace/trace_sched_switch.c
> > > b/kernel/trace/trace_sched_switch.c
> > > index e4fbcc3bede5..210c74dc
locating the other array.
>
> Cc: sta...@vger.kernel.org
> Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file")
> Signed-off-by: Steven Rostedt (Google)
Reviewed-by: Tim Chen
> ---
> kernel/trace/trace.c | 73 +---
x27;t need to be allocated.
This patch does make better use of the extra space and make the
previous change better.
Reviewed-by: Tim Chen
>
> Link:
> https://lore.kernel.org/linux-trace-kernel/20240212174011.06821...@gandalf.local.home/
>
> Signed-off-by: Steven Rostedt (Goo
On Thu, 2024-02-08 at 10:53 -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)"
>
> While looking at improving the saved_cmdlines cache I found a huge amount
> of wasted memory that should be used for the cmdlines.
>
> The tracing data saves pids during the trace. At sched switch, if
On Mon, 2024-01-22 at 09:20 -0800, Haitao Huang wrote:
>
> @@ -1047,29 +1037,38 @@ static struct mem_cgroup
> *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
> * @encl:an enclave pointer
> * @page_index: enclave page index
> * @backing: data for accessing backing storage for the pa
On Tue, 2023-12-12 at 15:27 +0100, Vincent Guittot wrote:
> Provide to the scheduler a feedback about the temporary max available
> capacity. Unlike arch_update_thermal_pressure, this doesn't need to be
> filtered as the pressure will happen for dozens ms or more.
>
> Signed-off-by: Vincent Guitto
On 3/23/21 4:21 PM, Song Bao Hua (Barry Song) wrote:
>>
>> On 3/18/21 9:16 PM, Barry Song wrote:
>>> From: Tim Chen
>>>
>>> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
>>> is shared among a cluster of cores instead of be
On 4/9/21 12:24 AM, Michal Hocko wrote:
> On Thu 08-04-21 13:29:08, Shakeel Butt wrote:
>> On Thu, Apr 8, 2021 at 11:01 AM Yang Shi wrote:
> [...]
>>> The low priority jobs should be able to be restricted by cpuset, for
>>> example, just keep them on second tier memory nodes. Then all the
>>> a
On 4/8/21 10:18 AM, Shakeel Butt wrote:
>
> Using v1's soft limit like behavior can potentially cause high
> priority jobs to stall to make enough space on top tier memory on
> their allocation path and I think this patchset is aiming to reduce
> that impact by making kswapd do that work. Howe
On 4/12/21 12:20 PM, Shakeel Butt wrote:
>>
>> memory_t0.current Current usage of tier 0 memory by the cgroup.
>>
>> memory_t0.min If tier 0 memory used by the cgroup falls below this
>> low
>> boundary, the memory will not be subjected to
>> demotion
>
On 4/8/21 1:29 PM, Shakeel Butt wrote:
> On Thu, Apr 8, 2021 at 11:01 AM Yang Shi wrote:
>
> The low and min limits have semantics similar to the v1's soft limit
> for this situation i.e. letting the low priority job occupy top tier
> memory and depending on reclaim to take back the excess to
On 4/13/21 6:04 PM, Huang, Ying wrote:
> Tim Chen writes:
>
>> On 4/12/21 6:27 PM, Huang, Ying wrote:
>>
>>>
>>> This isn't the commit that introduces the race. You can use `git blame`
>>> find out the correct commit. For this it'
On 4/12/21 6:27 PM, Huang, Ying wrote:
>
> This isn't the commit that introduces the race. You can use `git blame`
> find out the correct commit. For this it's commit 0bcac06f27d7 "mm,
> swap: skip swapcache for swapin of synchronous device".
>
> And I suggest to merge 1/5 and 2/5 to make i
On 4/13/21 3:45 AM, Song Bao Hua (Barry Song) wrote:
>
>
>
> Right now in the main cases of using wake_affine to achieve
> better performance, processes are actually bound within one
> numa which is also a LLC in kunpeng920.
>
> Probably LLC=NUMA is also true for X86 Jacobsville, Tim?
In ge
On 4/8/21 4:52 AM, Michal Hocko wrote:
>> The top tier memory used is reported in
>>
>> memory.toptier_usage_in_bytes
>>
>> The amount of top tier memory usable by each cgroup without
>> triggering page reclaim is controlled by the
>>
>> memory.toptier_soft_limit_in_bytes
>
Michal,
Thanks fo
On 4/9/21 8:26 AM, Vincent Guittot wrote:
I was expecting idle load balancer to be rate limited to 60 Hz, which
>>>
>>> Why 60Hz ?
>>>
>>
>> My thinking is we will trigger load balance only after rq->next_balance.
>>
>> void trigger_load_balance(struct rq *rq)
>> {
>> /* Don't
On 4/9/21 1:42 AM, Miaohe Lin wrote:
> On 2021/4/9 5:34, Tim Chen wrote:
>>
>>
>> On 4/8/21 6:08 AM, Miaohe Lin wrote:
>>> When I was investigating the swap code, I found the below possible race
>>> window:
>>>
>&
On 4/8/21 7:51 AM, Vincent Guittot wrote:
>> I was suprised to find the overall cpu% consumption of
>> update_blocked_averages
>> and throughput of the benchmark still didn't change much. So I took a
>> peek into the profile and found the update_blocked_averages calls shifted to
>> the idle
On 4/8/21 6:08 AM, Miaohe Lin wrote:
> When I was investigating the swap code, I found the below possible race
> window:
>
> CPU 1 CPU 2
> - -
> do_swap_page
> synchronous swap_readpage
> alloc_page_vma
>
On 4/6/21 2:08 AM, Michal Hocko wrote:
> On Mon 05-04-21 10:08:24, Tim Chen wrote:
> [...]
>> To make fine grain cgroup based management of the precious top tier
>> DRAM memory possible, this patchset adds a few new features:
>> 1. Provides memory monitors on the amount
On 4/7/21 7:02 AM, Vincent Guittot wrote:
> Hi Tim,
>
> On Wed, 24 Mar 2021 at 17:05, Tim Chen wrote:
>>
>>
>>
>> On 3/24/21 6:44 AM, Vincent Guittot wrote:
>>> Hi Tim,
>>
>>>
>>> IIUC your problem, we call update_blocked_aver
For each memory cgroup, account its usage of the
top tier memory at the time a top tier page is assigned and
uncharged from the cgroup.
Signed-off-by: Tim Chen
---
include/linux/memcontrol.h | 1 +
mm/memcontrol.c| 39 +-
2 files changed, 39
Detect during page allocation whether free toptier memory is low.
If so, wake up kswapd to reclaim memory from those mem cgroups
that have exceeded their limit.
Signed-off-by: Tim Chen
---
include/linux/mmzone.h | 3 +++
mm/page_alloc.c| 2 ++
mm/vmscan.c| 2 +-
3 files
Update the toptier_scale_factor via sysctl. This variable determines
when kswapd wakes up to recalaim toptier memory from those mem cgroups
exceeding their toptier memory limit.
Signed-off-by: Tim Chen
---
include/linux/mm.h | 4
include/linux/mmzone.h | 2 ++
kernel/sysctl.c
eded their
toptier memory soft limit by deomoting the top tier pages to
lower memory tier.
Signed-off-by: Tim Chen
---
Documentation/admin-guide/sysctl/vm.rst | 12 +
include/linux/mmzone.h | 2 +
mm/page_alloc.c | 14 +
m
Add toptier relcaim type in mem_cgroup_soft_limit_reclaim().
This option reclaims top tier memory from cgroups in the order of its
excess usage of top tier memory.
Signed-off-by: Tim Chen
---
include/linux/memcontrol.h | 9 ---
mm/memcontrol.c| 48
Track the global top tier memory usage stats. They are used as the basis of
deciding when to start demoting pages from memory cgroups that have exceeded
their soft limit. We start reclaiming top tier memory when the total
top tier memory is low.
Signed-off-by: Tim Chen
---
include/linux
allow returning the cgroup that has the largest exceess usage
of toptier memory.
Signed-off-by: Tim Chen
---
include/linux/memcontrol.h | 9 +++
mm/memcontrol.c| 152 +++--
2 files changed, 122 insertions(+), 39 deletions(-)
diff --git a/include
Define a per node soft_limit_top_tier red black tree that sort and track
the cgroups by each group's excess over its toptier soft limit. A cgroup
is added to the tree if it has exceeded its top tier soft limit and it
has used pages on the node.
Signed-off-by: Tim Chen
---
mm/memcontrol.c
In memory cgroup's sysfs, report the memory cgroup's usage
of top tier memory in a new field: "toptier_usage_in_bytes".
Signed-off-by: Tim Chen
---
mm/memcontrol.c | 8
1 file changed, 8 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fe7bb8
For each memory cgroup, define a soft memory limit on
its top tier memory consumption. Memory cgroups exceeding
their top tier limit will be selected for demotion of
their top tier memory to lower tier under memory pressure.
Signed-off-by: Tim Chen
---
include/linux/memcontrol.h | 1 +
mm
/expensive memory lives in the top tier of the memory
hierachy and it is a precious resource that needs to be accounted and
managed on a memory cgroup basis.
Define the top tier memory as the memory nodes that don't have demotion
paths into it from higher tier memory.
Signed-off-by: Tim
es in lieu of discard
and
[PATCH 0/6] [RFC v6] NUMA balancing: optimize memory placement for memory
tiering system
It is part of a larger patchset. You can play with the complete set of patches
using the tree:
https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/log/?h=tiering-0.71
On 3/24/21 6:44 AM, Vincent Guittot wrote:
> Hi Tim,
>
> IIUC your problem, we call update_blocked_averages() but because of:
>
> if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
> update_next_balance(sd, &next_balance);
>
On 3/18/21 9:16 PM, Barry Song wrote:
> From: Tim Chen
>
> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
> is shared among a cluster of cores instead of being exclusive
> to one single core.
>
> To prevent oversubscription of L2 cache, load should
load_balance(). If we don't do any load balance
in the code path,
we can let the idle load balancer update the blocked averages lazily.
Something like the following perhaps on top of Vincent's patch? We haven't
really tested
this change yet but want to see if this change makes se
> It seems sensible the more CPU we get in the cluster, the more
> we need the kernel to be aware of its existence.
>
> Tim, it is possible for you to bring up the cpu_cluster_mask and
> cluster_sibling for x86 so that the topology can be represented
> in sysfs and be used by scheduler? It seems
On 3/5/21 1:11 AM, Michal Hocko wrote:
> On Thu 04-03-21 09:35:08, Tim Chen wrote:
>>
>>
>> On 2/18/21 11:13 AM, Michal Hocko wrote:
>>
>>>
>>> Fixes: 4e41695356fb ("memory controller: soft limit reclaim on contention")
>>> Acked-
bit more, I realize that there is a chance that the
removed
next_mz could be inserted back to the tree from a memcg_check_events
that happen in between. So we need to make sure that the next_mz
is indeed off the tree and update the excess value before adding it
back. Update the patch to the pa
On 3/2/21 2:30 AM, Peter Zijlstra wrote:
> On Tue, Mar 02, 2021 at 11:59:40AM +1300, Barry Song wrote:
>> From: Tim Chen
>>
>> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
>> is shared among a cluster of cores instead of being exclusive
>&g
On 2/26/21 12:52 AM, Michal Hocko wrote:
>>
>> Michal,
>>
>> Let's take an extreme case where memcg 1 always generate the
>> first event and memcg 2 generates the rest of 128*8-1 events
>> and the pattern repeat.
>
> I do not follow. Events are per-memcg, aren't they?
> __this_cpu_read(m
On 2/24/21 3:53 AM, Michal Hocko wrote:
> On Mon 22-02-21 11:48:37, Tim Chen wrote:
>>
>>
>> On 2/22/21 11:09 AM, Michal Hocko wrote:
>>
>>>>
>>>> I actually have tried adjusting the threshold but found that it doesn't
>>>&
On 2/22/21 9:41 AM, Tim Chen wrote:
>
>
> On 2/22/21 12:40 AM, Michal Hocko wrote:
>> On Fri 19-02-21 10:59:05, Tim Chen wrote:
> occurrence.
>>>>
>>>> Soft limit is evaluated every THRESHOLDS_EVENTS_TARGET *
>>>> SOFTLIMIT_EVENTS_TARG
On 2/22/21 11:09 AM, Michal Hocko wrote:
>>
>> I actually have tried adjusting the threshold but found that it doesn't work
>> well for
>> the case with unenven memory access frequency between cgroups. The soft
>> limit for the low memory event cgroup could creep up quite a lot, exceeding
>>
On 2/22/21 11:09 AM, Michal Hocko wrote:
> On Mon 22-02-21 09:41:00, Tim Chen wrote:
>>
>>
>> On 2/22/21 12:40 AM, Michal Hocko wrote:
>>> On Fri 19-02-21 10:59:05, Tim Chen wrote:
>> occurrence.
>>>>>
>>&
On 2/17/21 9:56 PM, Johannes Weiner wrote:
>> static inline void uncharge_gather_clear(struct uncharge_gather *ug)
>> @@ -6849,7 +6850,13 @@ static void uncharge_page(struct page *page, struct
>> uncharge_gather *ug)
>> * exclusive access to the page.
>> */
>>
>> -if (ug->me
On 2/22/21 12:41 AM, Michal Hocko wrote:
>>
>>
>> Ah, that's true. The added check for soft_limit_excess is not needed.
>>
>> Do you think it is still a good idea to add patch 3 to
>> restrict the uncharge update in page batch of the same node and cgroup?
>
> I would rather drop it. The less
On 2/22/21 12:40 AM, Michal Hocko wrote:
> On Fri 19-02-21 10:59:05, Tim Chen wrote:
occurrence.
>>>
>>> Soft limit is evaluated every THRESHOLDS_EVENTS_TARGET *
>>> SOFTLIMIT_EVENTS_TARGET.
>>> If all events correspond with a newly charged memory an
On 2/19/21 10:59 AM, Tim Chen wrote:
>
>
> On 2/19/21 1:11 AM, Michal Hocko wrote:
>>
>> Soft limit is evaluated every THRESHOLDS_EVENTS_TARGET *
>> SOFTLIMIT_EVENTS_TARGET.
>> If all events correspond with a newly charged memory and the last event
>>
On 2/19/21 1:16 AM, Michal Hocko wrote:
>>
>> Something like this?
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 8bddee75f5cb..b50cae3b2a1a 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3472,6 +3472,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t
>
On 2/19/21 1:11 AM, Michal Hocko wrote:
> On Wed 17-02-21 12:41:35, Tim Chen wrote:
>> Memory is accessed at a much lower frequency
>> for the second cgroup. The memcg event update was not triggered for the
>> second cgroup as the memcg event update didn't hap
On 2/18/21 11:13 AM, Michal Hocko wrote:
> On Thu 18-02-21 10:30:20, Tim Chen wrote:
>>
>>
>> On 2/18/21 12:24 AM, Michal Hocko wrote:
>>
>>>
>>> I have already acked this patch in the previous version along with Fixes
>>> tag. It seems th
On 2/18/21 12:24 AM, Michal Hocko wrote:
>
> I have already acked this patch in the previous version along with Fixes
> tag. It seems that my review feedback has been completely ignored also
> for other patches in this series.
Michal,
My apology. Our mail system screwed up and there are som
memory
soft limit.
Reviewed-by: Ying Huang
Signed-off-by: Tim Chen
---
mm/memcontrol.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d72449eeb85a..8bddee75f5cb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6804,6 +6804,7
-by: Tim Chen
---
mm/memcontrol.c | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a51bf90732cb..d72449eeb85a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -985,15 +985,22 @@ static bool mem_cgroup_event_ratelimit(struct
this patch.
Thanks.
Tim
Changelog:
v2
1. Do soft limit tree uncharge update in batch of the same node only
for v1 cgroups that have a soft limit. Batching in nodes is only
relevant for cgroup v1 that has per node soft limit tree.
Tim Chen (3):
mm: Fix dropped memcg from mem cgroup soft limit
cgroup exceeded its soft limit. Fix the logic and put the mem
cgroup back on the tree when page reclaim failed for the mem cgroup.
Reviewed-by: Ying Huang
Signed-off-by: Tim Chen
---
mm/memcontrol.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm
d domain for x86.
Thanks.
Tim
>8--
>From 9189e489b019e110ee6e9d4183e243e48f44ff25 Mon Sep 17 00:00:00 2001
From: Tim Chen
Date: Tue, 16 Feb 2021 08:24:39 -0800
Subject: [RFC PATCH] scheduler: Add cluster scheduler level for x86
To: , , ,
, , ,
, ,
, , ,
, , ,
, , ,
Cc: , ,
, , ,
, , Jonathan Ca
On 2/9/21 2:22 PM, Johannes Weiner wrote:
> Hello Tim,
>
> On Tue, Feb 09, 2021 at 12:29:47PM -0800, Tim Chen wrote:
>> @@ -6849,7 +6850,9 @@ static void uncharge_page(struct page *page, struct
>> uncharge_gather *ug)
>> * exclusive access to the page.
>
, with each batch of
pages all in the same mem cgroup and memory node. An update is issued for
the batch of pages of a node collected till now whenever we encounter
a page belonging to a different node.
Reviewed-by: Ying Huang
Signed-off-by: Tim Chen
---
mm/memcontrol.c | 6 +-
1 file
-by: Tim Chen
---
mm/memcontrol.c | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a51bf90732cb..d72449eeb85a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -985,15 +985,22 @@ static bool mem_cgroup_event_ratelimit(struct
cgroup exceeded its soft limit. Fix the logic and put the mem
cgroup back on the tree when page reclaim failed for the mem cgroup.
Reviewed-by: Ying Huang
Signed-off-by: Tim Chen
---
mm/memcontrol.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm
During testing of tiered memory management based on memory soft limit, I found
three
issues with memory management using cgroup based soft limit in the mainline
code.
Fix the issues with the three patches in this series.
Tim Chen (3):
mm: Fix dropped memcg from mem cgroup soft limit tree
On 1/8/21 7:12 AM, Morten Rasmussen wrote:
> On Thu, Jan 07, 2021 at 03:16:47PM -0800, Tim Chen wrote:
>> On 1/6/21 12:30 AM, Barry Song wrote:
>>> ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each
>>> cluster has 4 cpus. All clusters sha
che sched domain, sans the idle cpu selection on wake up code. It is
similar enough in concept to Barry's patch that we should have a
single patchset that accommodates both use cases.
Thanks.
Tim
>From e0e7e42e1a033c9634723ff1dc80b426deeec1e9 Mon Sep 17 00:00:00 2001
Message-Id:
In-
On 9/24/20 10:13 AM, Phil Auld wrote:
> On Thu, Sep 24, 2020 at 09:37:33AM -0700 Tim Chen wrote:
>>
>>
>> On 9/22/20 12:14 AM, Vincent Guittot wrote:
>>
>>>>
>>>>>>
>>>>>> And a quick test with hackbench on my octo
On 9/22/20 12:14 AM, Vincent Guittot wrote:
>>
And a quick test with hackbench on my octo cores arm64 gives for 12
Vincent,
Is it octo (=10) or octa (=8) cores on a single socket for your system?
The L2 is per core or there are multiple L2s shared among groups of cores?
Wonder if p
On 7/2/20 5:57 AM, Joel Fernandes wrote:
> On Wed, Jul 01, 2020 at 05:54:11PM -0700, Tim Chen wrote:
>>
>>
>> On 7/1/20 4:28 PM, Joel Fernandes wrote:
>>> On Tue, Jun 30, 2020 at 09:32:27PM +, Vineeth Remanan Pillai wrote:
>>>> From: Peter Zijls
(Intel)
>> Signed-off-by: Julien Desfossez
>> Signed-off-by: Vineeth Remanan Pillai
>> Signed-off-by: Aaron Lu
>> Signed-off-by: Tim Chen
>
> Hi Peter, Tim, all, the below patch fixes the hotplug issue described in the
> below patch's Link tag. Patch
On 6/26/20 8:47 PM, Andrew Morton wrote:
> On Sat, 27 Jun 2020 04:13:04 +0100 Matthew Wilcox wrote:
>
>> On Fri, Jun 26, 2020 at 02:23:03PM -0700, Tim Chen wrote:
>>> Enlarge the pagevec size to 31 to reduce LRU lock contention for
>>> large systems.
>>>
from 88.8 Mpages/sec to 95.1 Mpages/sec.
Signed-off-by: Tim Chen
---
include/linux/pagevec.h | 8
1 file changed, 8 insertions(+)
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 081d934eda64..466ebcdd190d 100644
--- a/include/linux/pagevec.h
+++ b/include/linux
lot_cache_initialized)
> + if (!swap_slot_cache_enabled)
This simplification is okay. !swap_slot_cache_initialize implies
!swap_slot_cache_enabled.
So only !swap_slot_cache_enabled needs to be checked.
> return false;
>
> pages = get_nr_swap_pages();
>
Acked-by: Tim Chen
be done when swap_slot_cache_initialized
> is false.
>
> No functional change.
>
> Signed-off-by: Zhen Lei
Acked-by: Tim Chen
> ---
> mm/swap_slots.c | 22 ++
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/mm/swap_slot
= NULL;
> -out:
> mutex_unlock(&swap_slots_cache_mutex);
> - if (slots)
> - kvfree(slots);
> - if (slots_ret)
> - kvfree(slots_ret);
> return 0;
> }
>
>
Acked-by: Tim Chen
On 10/2/19 9:11 AM, David Laight wrote:
> From: Parth Shah
>> Sent: 30 September 2019 11:44
> ...
>> 5> Separating AVX512 tasks and latency sensitive tasks on separate cores
>> ( -Tim Chen )
>>
On 9/24/19 7:40 PM, Aubrey Li wrote:
> On Sat, Sep 7, 2019 at 2:30 AM Tim Chen wrote:
>> +static inline s64 core_sched_imbalance_delta(int src_cpu, int dst_cpu,
>> + int src_sibling, int dst_sibling,
>> + struct task_gro
On 9/19/19 2:06 AM, David Laight wrote:
> From: Tim Chen
>> Sent: 18 September 2019 18:16
> ...
>> Some users are running machine learning batch tasks with AVX512, and have
>> observed
>> that these tasks affect the tasks needing a fast response. They have to
>
On 9/19/19 1:37 AM, Parth Shah wrote:
>
>>
>> $> Separating AVX512 tasks and latency sensitive tasks on separate cores
>> -
>> Another usecase we are considering is to segregate those workload that will
>> pull down
>> core c
On 9/4/19 6:44 PM, Julien Desfossez wrote:
> +
> +static void coresched_idle_worker_fini(struct rq *rq)
> +{
> + if (rq->core_idle_task) {
> + kthread_stop(rq->core_idle_task);
> + rq->core_idle_task = NULL;
> + }
During testing, I have found access of rq->core_idl
On 9/10/19 7:27 AM, Julien Desfossez wrote:
> On 29-Aug-2019 04:38:21 PM, Peter Zijlstra wrote:
>> On Thu, Aug 29, 2019 at 10:30:51AM -0400, Phil Auld wrote:
>>> I think, though, that you were basically agreeing with me that the current
>>> core scheduler does not close the holes, or am I reading t
On 9/17/19 6:33 PM, Aubrey Li wrote:
> On Sun, Sep 15, 2019 at 10:14 PM Aaron Lu wrote:
>>
>> And I have pushed Tim's branch to:
>> https://github.com/aaronlu/linux coresched-v3-v5.1.5-test-tim
>>
>> Mine:
>> https://github.com/aaronlu/linux coresched-v3-v5.1.5-test-core_vruntime
Aubrey,
Thank
On 9/18/19 5:41 AM, Parth Shah wrote:
> Hello everyone,
>
> As per the discussion in LPC2019, new per-task property like latency-nice
> can be useful in certain scenarios. The scheduler can take proper decision
> by knowing latency requirement of a task from the end-user itself.
>
> There has alr
On 9/13/19 7:15 AM, Aaron Lu wrote:
> On Thu, Sep 12, 2019 at 10:29:13AM -0700, Tim Chen wrote:
>
>> The better thing to do is to move one task from cgroupA to another core,
>> that has only one cgroupA task so it can be paired up
>> with that lonely cgroupA task. This w
On 9/12/19 5:35 AM, Aaron Lu wrote:
> On Wed, Sep 11, 2019 at 12:47:34PM -0400, Vineeth Remanan Pillai wrote:
>
> core wide vruntime makes sense when there are multiple tasks of
> different cgroups queued on the same core. e.g. when there are two
> tasks of cgroupA and one task of cgroupB are que
On 9/12/19 5:04 AM, Aaron Lu wrote:
> Well, I have done following tests:
> 1 Julien's test script: https://paste.debian.net/plainh/834cf45c
> 2 start two tagged will-it-scale/page_fault1, see how each performs;
> 3 Aubrey's mysql test: https://github.com/aubreyli/coresched_bench.git
>
> They all
On 9/11/19 7:02 AM, Aaron Lu wrote:
> Hi Tim & Julien,
>
> On Fri, Sep 06, 2019 at 11:30:20AM -0700, Tim Chen wrote:
>> On 8/7/19 10:10 AM, Tim Chen wrote:
>>
>>> 3) Load balancing between CPU cores
>>> ---
>>> Sa
On 9/4/19 6:44 PM, Julien Desfossez wrote:
>@@ -3853,7 +3880,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev,
>struct rq_flags *rf)
> goto done;
> }
>
>- if (!is_idle_task(p))
>+ if (!is_fo
On 8/7/19 10:10 AM, Tim Chen wrote:
> 3) Load balancing between CPU cores
> ---
> Say if one CPU core's sibling threads get forced idled
> a lot as it has mostly incompatible tasks between the siblings,
> moving the incompatible load to oth
On 8/30/19 10:49 AM, subhra mazumdar wrote:
> Add Cgroup interface for latency-nice. Each CPU Cgroup adds a new file
> "latency-nice" which is shared by all the threads in that Cgroup.
Subhra,
Thanks for posting the patchset. Having a latency nice hint
is useful beyond idle load balancing. I c
On 8/28/19 9:01 AM, Peter Zijlstra wrote:
> On Wed, Aug 28, 2019 at 11:30:34AM -0400, Phil Auld wrote:
>> On Tue, Aug 27, 2019 at 11:50:35PM +0200 Peter Zijlstra wrote:
>
>> The current core scheduler implementation, I believe, still has
>> (theoretical?)
>> holes involving interrupts, once/if t
On 8/27/19 2:50 PM, Peter Zijlstra wrote:
> On Tue, Aug 27, 2019 at 10:14:17PM +0100, Matthew Garrett wrote:
>> Apple have provided a sysctl that allows applications to indicate that
>> specific threads should make use of core isolation while allowing
>> the rest of the system to make use of SMT,
On 8/8/19 10:27 AM, Tim Chen wrote:
> On 8/7/19 11:47 PM, Aaron Lu wrote:
>> On Tue, Aug 06, 2019 at 02:19:57PM -0700, Tim Chen wrote:
>>> +void account_core_idletime(struct task_struct *p, u64 exec)
>>> +{
>>> + const struct cpumask *smt_mask;
>>&g
On 8/7/19 11:47 PM, Aaron Lu wrote:
> On Tue, Aug 06, 2019 at 02:19:57PM -0700, Tim Chen wrote:
>> +void account_core_idletime(struct task_struct *p, u64 exec)
>> +{
>> +const struct cpumask *smt_mask;
>> +struct rq *rq;
>> +bool force_idle, refill;
&g
On 8/8/19 5:55 AM, Aaron Lu wrote:
> On Mon, Aug 05, 2019 at 08:55:28AM -0700, Tim Chen wrote:
>> On 8/2/19 8:37 AM, Julien Desfossez wrote:
>>> We tested both Aaron's and Tim's patches and here are our results.
>
> diff --git a/kernel/sched/core.c b/kernel
On 8/7/19 1:58 AM, Dario Faggioli wrote:
> So, here comes my question: I've done a benchmarking campaign (yes,
> I'll post numbers soon) using this branch:
>
> https://github.com/digitalocean/linux-coresched.git
> vpillai/coresched-v3-v5.1.5-test
> https://github.com/digitalocean/linux-coresche
>From ede10309986a6b1bcc82d317f86a5b06459d76bd Mon Sep 17 00:00:00 2001
From: Tim Chen
Date: Wed, 24 Jul 2019 13:58:18 -0700
Subject: [PATCH 1/2] sched: Move sched fair prio comparison to fair.c
Consolidate the task priority comparison of the fair class
to fair.c. A simple code r
On 8/5/19 8:24 PM, Aaron Lu wrote:
> I've been thinking if we should consider core wide tenent fairness?
>
> Let's say there are 3 tasks on 2 threads' rq of the same core, 2 tasks
> (e.g. A1, A2) belong to tenent A and the 3rd B1 belong to another tenent
> B. Assume A1 and B1 are queued on the sa
On 8/2/19 8:37 AM, Julien Desfossez wrote:
> We tested both Aaron's and Tim's patches and here are our results.
>
> Test setup:
> - 2 1-thread sysbench, one running the cpu benchmark, the other one the
> mem benchmark
> - both started at the same time
> - both are pinned on the same core (2 hard
impler and we don't need to use
one of the sibling's cfs_rq min_vruntime as a time base.
In really limited testing, it seems to have balanced fairness between two
tagged cgroups.
Tim
---patch 1--
From: Tim Chen
Date: Wed, 24 Jul 2019 13:58:18 -0700
Subject: [PATCH 1/2] sched: mo
1 - 100 of 1007 matches
Mail list logo