Daniel P. Berrangé, Oct 22, 2024 at 15:15:
> On Tue, Oct 22, 2024 at 02:46:15PM +0200, Igor Mammedov wrote:
>> On Fri, 18 Oct 2024 13:59:34 +0100
>> Daniel P. Berrangé <berra...@redhat.com> wrote:
>> 
>> > On Fri, Oct 18, 2024 at 02:25:26PM +0200, Igor Mammedov wrote:
>> > > On Wed, 16 Oct 2024 14:56:39 +0200
>> > > "Anthony Harivel" <ahari...@redhat.com> wrote:
>> [...]
>> 
>> > > 
>> > > This also leads to a question, if we should account for
>> > > not VCPU threads at all. Looking at real hardware, those
>> > > MSRs return power usage of CPUs only, and they do not
>> > > return consumption from auxiliary system components
>> > > (io/memory/...). One can consider non VCPU threads in QEMU
>> > > as auxiliary components, so we probably should not to
>> > > account for them at all when modeling the same hw feature.
>> > > (aka be consistent with what real hw does).  
>> > 
>> > I understand your POV, but I think that would be a mistake,
>> > and would undermine the usefulness of the feature.
>> > 
>> > The deployment model has a cluster of hosts and guests, all
>> > belonging to the same user. The user goal is to measure host
>> > power consumption imposed by the guest, and dynamically adjust
>> > guest workloads in order to minimize power consumption of the
>> > host.
>> 
>> For cloud use-case, host side is likely in a better position
>> to accomplish the task of saving power by migrating VM to
>> another socket/host to compact idle load. (I've found at least 1
>> kubernetis tool[1], which does energy monitoring). Perhaps there
>> are schedulers out there that do that using its data.

I also work for Kepler project. I use it to monitor my VM has a black 
box and I used it inside my VM with this feature enable. Thanks to that 
I can optimize the workloads (dpdk application,database,..) inside my VM. 

This is the use-case in NFV deployment and I'm pretty sure this could be 
the use-case of many others.

>
> The host admin can merely shuffle workloads around, hoping that
> a different packing of workloads onto machines, will reduce power
> in some aount. You might win a few %, or low 10s of % with this
> if you're good at it.
>
> The guest admin can change the way their workload operates to
> reduce its inherant power consumption baseline. You could easily
> come across ways to win high 10s of % with this. That's why it
> is interesting to expose power consumption info to the guest
> admin.
>
> IOW, neither makes the other obsolete, both approaches are
> desirable.
>
>> > The guest workloads can impose non-negligble power consumption
>> > loads on non-vCPU threads in QEMU. Without that accounted for,
>> > any adjustments will be working from (sometimes very) inaccurate
>> > data.
>> 
>> Perhaps adding one or several energy sensors (ex: some i2c ones),
>> would let us provide auxiliary threads consumption to guest, and
>> even make it more granular if necessary (incl. vhost user/out of
>> process device models or pass-through devices if they have PMU).
>> It would be better than further muddling vCPUs consumption
>> estimates with something that doesn't belong there.

I'm confused about your statement. Like every software power metering 
tools out is using RAPL (Kepler, Scaphandre, PowerMon, etc) and custom 
sensors would be better than a what everyone is using ?
The goal is not to be accurate. The goal is to be able to compare 
A against B in the same environment and RAPL is given reproducible 
values to do so.
Adding RAPL inside VM makes total sens because you can use tools that 
are already out in the market.

>
> There's a tradeoff here in that info directly associated with
> backends threads, is effectively exposing private QEMU impl
> details as public ABI. IOW, we don't want too fine granularity
> here, we need it abstracted sufficiently, that different
> backend choices for a given don't change what sensors are
> exposed.
>
> I also wonder how existing power monitoring applications
> would consume such custom sensors - is there sufficient
> standardization in this are that we're not inventing
> something totally QEMU specific ?
>
>> > IOW, I think it is right to include non-vCPU threads usage in
>> > the reported info, as it is still fundamentally part of the
>> > load that the guest imposes on host pCPUs it is permitted to
>> > run on.
>> 
>> 
>> From what I've read, process energy usage done via RAPL is not
>> exactly accurate. But there are monitoring tools out there that
>> use RAPL and other sources to make energy consumption monitoring
>> more reliable.
>> 
>> Reinventing that wheel and pulling all of the nuances of process
>> power monitoring inside of QEMU process, needlessly complicates it.
>> Maybe we should reuse one of existing tools and channel its data
>> through appropriate QEMU channels (RAPL/emulated PMU counters/...).
>
> Note, this feature is already released in QEMU 9.1.0.
>
>> Implementing RAPL in pure form though looks fine to me,
>> so the same tools could use it the same way as on the host
>> if needed without VM specific quirks.
>
> IMHO the so called "pure" form is misleading to applications, unless
> we first provided  some other pratical way to expose the data that
> we would be throwing away from RAPL.
>

The other possibility that I've think of is using a 3rd party tool to 
give maybe more "accurate value" to QEMU. 
For example, Kepler could be used to give value for each thread 
of QEMU and so instead of calculating and using the qemu-vmsr-helper, 
each values is transfered on request by QEMU via the UNIX thread that is 
used today between the daemon and QEMU. It's just an idea that I have 
and I don't know if that is acceptable for each project (QEMU and 
Kepler) that would really solve few issues.

> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to