Based on Thomas and Peterz feedback Can think of two variants which target:
-Support monitoring and allocating using the same resctrl group. user can use a resctrl group to allocate resources and also monitor them (with respect to tasks or cpu) -allows 'task only' monitoring outside of resctrl. This mode can be used when user wants to override the RMIDs in the resctrl or when he wants to monitor more than just the resctrl groups. option 1> without modifying the resctrl In this design everything in resctrl interface works like before (the info, resource group files like task schemata all remain the same) but the resctrl groups are mapped to one RMID as well as a CLOSID. But we need a user interface for the user to read the counters. We can create one file to set monitoring and one file in resctrl directory which will reflect the counts but may not be efficient as a lot of times user reads the counts frequently. For the user interface there may be two options to do this: 1.a> Build a new user mode interface resmon Since modifying the existing perf to suit the different h/w architecture seems to not follow the CAT interface model, it may well be better to have a different and dedicated interface for the RDT monitoring (just like we had a new fs for CAT) $resmon -r <resctrl group> -s <mon_mask> -I <time in ms> "resctrl group": is the resctrl directory. "mon_mask: is a bit mask of logical packages which indicates which packages user is interested in monitoring. "time in ms": The time for which the monitoring takes place (this can potentially be changed to start and stop/read options) Example 1 (Some examples modeled from resctrl ui documentation) --------- A single socket system which has real-time tasks running on core 4-7 and non real-time workload assigned to core 0-3. The real-time tasks share text and data, so a per task association is not required and due to interaction with the kernel it's desired that the kernel on these cores shares L3 with the tasks. # cd /sys/fs/resctrl # echo "L3:0=3ff" > schemata core 0-1 are assigned to the new group and make sure that the kernel and the tasks running there get 50% of the cache. # echo 03 > p0/cpus monitor the cpus 0-1 for 5s # resmon -r p0 -s 1 -I 5000 Example 2 --------- A real time task running on cpu 2-3(socket 0) is allocated a dedicated 25% of the cache. # cd /sys/fs/resctrl # mkdir p1 # echo "L3:0=0f00;1=ffff" > p1/schemata # echo 5678 > p1/tasks # taskset -cp 2-3 5678 Monitor the task for 5s on socket zero # resmon -r p0 -s 1 -I 5000 Example 3 --------- sometimes user may just want to profile the cache occupancy first before assigning any CLOSids. Also this provides an override option where user can monitor some tasks which have say CLOS 0 that he is about to place in a CLOSId based on the amount of cache occupancy. This could apply to the same real time tasks above where user is caliberating the % of cache thats needed. monitor a task PIDx on socket 0 for 10s # resmon -t PIDx -s 1 -I 10000 1.b> Add a new option to perf apart from supporting the task monitoring in perf. - Monitor a resctrl group. Introduce a new option for perf "-R" which indicates to monitor a resctrl group. $mkdir /sys/fs/resctrl/p1 $echo PID1 > /sys/fs/resctrl/p1/tasks $echo PID2 > /sys/fs/resctrl/p1/tasks $perf stat -e llc_occupancy -R p1 would return the count for the resctrl group p1. - Monitor a task outside of resctrl group ('task only') In this case , the perf can also monitor individual tasks using the -t option just like before. $perf stat -e llc_occupancy -t p1 - Monitor CPUs. For the example 1 above , perf can be used to monitor the resctrl group p0 $perf stat -e llc_occupancy -t p0 The issue with both options may be what happens when we run out of RMIDs. For the resctrl groups , since we know the max groups that can be created and the # of CLOSIds is very less compared to # of RMIDs we reserve an RMID for each resctrl group so there is never a case that RMID is not available for resctrl group. For task monitoring , it can use the rest of the RMIDs. Why do we need seperate 'task only' monitoring ? ----------------------------------------- The seperate task monitoring option lets the user use the RMIDs effectively and not be restricted to # of CLOSids. Also deal with the scenarios of example 3. RMID allocation/init -------------------- resctrl monitoring: RMIDs are allocated when CLOSIds are allocated during mkdir. One RMId is allocated per socket just like CLOSid. task monitoring: When task events are created, RMIDs are allocated. Can also do a lazy allocation of RMIDs when the tasks are actually scheduled in on a socket. Kernel Scheduling ----------------- During ctx switch cqm choses the RMID in the following priority (1- highest priority) 1. if cpu has a RMID , choose that 2. if the task has a RMID directly tied to it choose that 3. choose the RMID of the task's resctrl Read ---- When user calls cqm to retrieve the monitored count, we read the counter_msr and return the count. option 2> Modifying the resctrl This changes the resctrl interface schemata where user inputs the CLOSids and RMIDs instead of CBMs. # cd /sys/fs/resctrl # mkdir p0 p1 # echo "L3:0=<closidx>;1=<closidy>" > /sys/fs/resctrl/p0/schemata There is a mapping between closid and cbm which the user can change. # echo 0xff > .../config/L3/0/cbm Display the CLOSids # ls .../config/L3/ 0 1 2 . . . 15 As an extension to cqm , this schemata can be modified to also have the RMIDs be chosen by the user. That way user can configure different RMIDs for the same CLOSid if needed like in example 3. and also since we have so many more RMIDs than CLOSids , user is not restricted by the number of resctrl groups he can create (With the current model, user cannot create more directories than the number of CLOSIds) # echo "L3:0=<closidx>,<RMID1>;1=<closidy>,<RMID2>" > /sys/fs/resctrl/p0/schemata user interface to monitor can be same as shown in the design variant #1 with the difference that this may have a lesser need for the 'task only' monitoring.