Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

Thomas Gleixner Thu, 19 Jan 2017 10:46:24 -0800

On Wed, 18 Jan 2017, Stephane Eranian wrote:
> On Wed, Jan 18, 2017 at 12:53 AM, Thomas Gleixner <t...@linutronix.de> wrote:
> >


> Your use case is specific to HPC and not Web workloads we run.  Jobs run
> in cgroups which may span all the CPUs of the machine.  CAT may be used
> to partition the cache. Cgroups would run inside a partition.  There may
> be multiple cgroups running in the same partition. I can understand the
> value of tracking occupancy per CLOSID, however that granularity is not
> enough for our use case.  Inside a partition, we want to know the
> occupancy of each cgroup to be able to assign blame to the top
> consumer. Thus, there needs to be a way to monitor occupancy per
> cgroup. I'd like to understand how your proposal would cover this use
> case.

The point I'm making as I explained to David is that we need to start from
the allocation angle. Of course can you monitor different tasks or task
groups inside an allocation.

> Another important aspect is that CQM measures new allocations, thus to
> get total occupancy you need to be able to monitor the thread, CPU,
> CLOSid or cgroup from the beginning of execution. In the case of a cgroup
> from the moment where the first thread is scheduled into the cgroup. To
> do this a RMID needs to be assigned from the beginning to the entity to
> be monitored.  It could be by creating a CQM event just to cause an RMID
> to be assigned as discussed earlier on this thread. And then if a perf
> stat is launched later it will get the same RMID and report full
> occupancy. But that requires the first event to remain alive, i.e., some
> process must keep the file descriptor open, i.e., need some daemon or a
> perf stat running in the background.

That's fine, but there must be a less convoluted way to do that. The
currently proposed stuff is simply horrible because it lacks any form of
design and is just hacked into submission.

> There are also use cases where you want CQM without necessarily enabling
> CAT, for instance, if you want to know the cache footprint of a workload
> to estimate how if it could be co-located with others.

That's a subset of the other stuff because it's all bound to CLOSID 0. So
you can again monitor tasks or tasks groups seperately.

Thanks,

        tglx

Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

Reply via email to