> > Fine. So we need this for ONE particular use case. And if that is not well > documented including the underlying mechanics to analyze the data then this > will be a nice source of confusion for Joe User. > > I still think that this can be done differently while keeping the overhead > small. > > You look at this from the existing perf mechanics which require high > overhead context switching machinery. But that's just wrong because that's > not how the cache and bandwidth monitoring works. > > Contrary to the other perf counters, CQM and MBM are based on a context > selectable set of counters which do not require readout and reconfiguration > when the switch happens. > > Especially with CAT in play, the context switch overhead is there already > when CAT partitions need to be switched. So switching the RMID at the same > time is basically free, if we are smart enough to do an equivalent to the > CLOSID context switch mechanics and ideally combine both into a single MSR > write. > > With that the low overhead periodic sampling can read N counters which are > related to the monitored set and provide N separate results. For bandwidth > the aggregation is a simple ADD and for cache residency it's pointless. > > Just because perf was designed with the regular performance counters in > mind (way before that CQM/MBM stuff came around) does not mean that we > cannot change/extend that if it makes sense. > > And looking at the way Cache/Bandwidth allocation and monitoring works, it > makes a lot of sense. Definitely more than shoving it into the current mode > of operandi with duct tape just because we can. >
You made a point. The use case I described can be better served with the low overhead monitoring groups that Fenghua is working on. Then that info can be merged with the per-CPU profile collected for non-RDT events. I am ok removing the perf-like CPU filtering from the requirements. Thanks, David