Re: [PATCH v6 0/3] Add support for the RAPL MSRs series

Anthony Harivel Tue, 22 Oct 2024 07:41:36 -0700

Daniel P. Berrangé, Oct 22, 2024 at 16:29:
> On Tue, Oct 22, 2024 at 04:16:36PM +0200, Anthony Harivel wrote:
>> Daniel P. Berrangé, Oct 22, 2024 at 15:15:
>> > On Tue, Oct 22, 2024 at 02:46:15PM +0200, Igor Mammedov wrote:
>> >> On Fri, 18 Oct 2024 13:59:34 +0100
>> >> Daniel P. Berrangé <berra...@redhat.com> wrote:
>> >> 
>> >> > On Fri, Oct 18, 2024 at 02:25:26PM +0200, Igor Mammedov wrote:
>> >> > > On Wed, 16 Oct 2024 14:56:39 +0200
>> >> > > "Anthony Harivel" <ahari...@redhat.com> wrote:
>> >> [...]
>> >> 
>> >> > > 
>> >> > > This also leads to a question, if we should account for
>> >> > > not VCPU threads at all. Looking at real hardware, those
>> >> > > MSRs return power usage of CPUs only, and they do not
>> >> > > return consumption from auxiliary system components
>> >> > > (io/memory/...). One can consider non VCPU threads in QEMU
>> >> > > as auxiliary components, so we probably should not to
>> >> > > account for them at all when modeling the same hw feature.
>> >> > > (aka be consistent with what real hw does).  
>> >> > 
>> >> > I understand your POV, but I think that would be a mistake,
>> >> > and would undermine the usefulness of the feature.
>> >> > 
>> >> > The deployment model has a cluster of hosts and guests, all
>> >> > belonging to the same user. The user goal is to measure host
>> >> > power consumption imposed by the guest, and dynamically adjust
>> >> > guest workloads in order to minimize power consumption of the
>> >> > host.
>> >> 
>> >> For cloud use-case, host side is likely in a better position
>> >> to accomplish the task of saving power by migrating VM to
>> >> another socket/host to compact idle load. (I've found at least 1
>> >> kubernetis tool[1], which does energy monitoring). Perhaps there
>> >> are schedulers out there that do that using its data.
>> 
>> I also work for Kepler project. I use it to monitor my VM has a black 
>> box and I used it inside my VM with this feature enable. Thanks to that 
>> I can optimize the workloads (dpdk application,database,..) inside my VM. 
>> 
>> This is the use-case in NFV deployment and I'm pretty sure this could be 
>> the use-case of many others.
>> 
>> >
>> > The host admin can merely shuffle workloads around, hoping that
>> > a different packing of workloads onto machines, will reduce power
>> > in some aount. You might win a few %, or low 10s of % with this
>> > if you're good at it.
>> >
>> > The guest admin can change the way their workload operates to
>> > reduce its inherant power consumption baseline. You could easily
>> > come across ways to win high 10s of % with this. That's why it
>> > is interesting to expose power consumption info to the guest
>> > admin.
>> >
>> > IOW, neither makes the other obsolete, both approaches are
>> > desirable.
>> >
>> >> > The guest workloads can impose non-negligble power consumption
>> >> > loads on non-vCPU threads in QEMU. Without that accounted for,
>> >> > any adjustments will be working from (sometimes very) inaccurate
>> >> > data.
>> >> 
>> >> Perhaps adding one or several energy sensors (ex: some i2c ones),
>> >> would let us provide auxiliary threads consumption to guest, and
>> >> even make it more granular if necessary (incl. vhost user/out of
>> >> process device models or pass-through devices if they have PMU).
>> >> It would be better than further muddling vCPUs consumption
>> >> estimates with something that doesn't belong there.
>> 
>> I'm confused about your statement. Like every software power metering 
>> tools out is using RAPL (Kepler, Scaphandre, PowerMon, etc) and custom 
>> sensors would be better than a what everyone is using ?
>> The goal is not to be accurate. The goal is to be able to compare 
>> A against B in the same environment and RAPL is given reproducible 
>> values to do so.
>
> Be careful with saying "The goal isnot to be accurate", as that's
> a very broad statement, and I don't think it is true.
>
>
> If you're doing A/B comparisons, you *do* need accuracy, in the
> sense that if a guest workload config change alters host CPU
> power consumption, you want that to be reflected in what the
> guest is told about its power usagte.
>
> ie if a change in B moves some power usage from a vCPU thread
> to a non-vCPU thread, you don't want that power usage to
> disappear from what's reported to the guest. It would give you
> the false idea that B is more efficient than A, even if the
> non-vCPU thread for B was cosuming x2 what the orignal vCPU
> thread was for A.
>
> What I think you don't need is for the absolute magnitude of
> the reported power consumption to be a precise match to the
> actual power consumption.
>
> ie if A and B are reported as 7 and 9 Watts respectively, it
> doesn't matter if the actual consumption was 12 and 15 watts.
>


Right, my bad, I agree. When I said "not accurate" I was indeed talking 
about the absolute magnitude of the reported power consumption. 
Like your example above is what I had in mind. 
Sorry for my clumsy shortcut and thanks for clarifying this important point.

Re: [PATCH v6 0/3] Add support for the RAPL MSRs series

Reply via email to