Igor Mammedov, Oct 18, 2024 at 14:25:
> On Wed, 16 Oct 2024 14:56:39 +0200
> "Anthony Harivel" <ahari...@redhat.com> wrote:
>
>> Hi Igor,
>> 
>> Igor Mammedov, Oct 16, 2024 at 13:52:
>> > On Wed, 22 May 2024 17:34:49 +0200
>> > Anthony Harivel <ahari...@redhat.com> wrote:
>> >  
>> >> Dear maintainers, 
>> >> 
>> >> First of all, thank you very much for your review of my patch 
>> >> [1].  
>> >
>> > I've tried to play with this feature and have a few questions about it
>> >  
>> 
>> Thanks for testing this new feature. 
>> 
>> >  1. trying to start with non accessible or not existent socket
>> >         -accel kvm,rapl=on,rapl-helper-socket=/tmp/socket 
>> >     I get:
>> >       qemu-system-x86_64: -accel 
>> > kvm,rapl=on,rapl-helper-socket=/tmp/socks: vmsr socket opening failed
>> >       qemu-system-x86_64: -accel 
>> > kvm,rapl=on,rapl-helper-socket=/tmp/socks: kvm : error RAPL feature 
>> > requirement not met
>> >     * is it possible to report actual OS error that happened during 
>> > open/connect,
>> >       instead of unhelpful 'socket opening failed'?
>> >
>> >       What I see in vmsr_open_socket() error is ignored
>> >       and btw it's error leak as well
>> >  
>> 
>> Shame you missed the 6 iterations of that patch that last for a year. 
>> I would have changed that directly !
>> Anyway I take note on that comment and will send a modification.
>> 
>> >     * 2nd line shouldn't be there if the 1st error already present.
>> >
>> >  2.  getting periodic error on console where QEMU has been starter
>> >       # ./qemu-vmsr-helper -k /tmp/sock
>> >      ./qemu-system-x86_64 -snapshot -m 4G -accel 
>> > kvm,rapl=on,rapl-helper-socket=/tmp/sock rhel90.img  -vnc :0 -cpu host
>> >      and let it run
>> >
>> >       it appears rdmsr works (well, it returns some values at least)
>> >       however there are recurring errors in qemu's stderr(or out)
>> >       
>> >       qemu-system-x86_64: Error opening /proc/2496093/task/2496109/stat
>> >       qemu-system-x86_64: Error opening /proc/2496093/task/2496095/stat
>> >
>> >       My guess it's some temporary threads, that come and go, but still
>> >       they shouldn't cause errors if it's normal operation.
>> >  
>> 
>> There a patch in WIP that change this into a Tracepoint. Maybe you can 
>> SSH to the VM in meanwhile ?
>
> it's just idling VM that doesn't do anything, hence the question.  
>
>> 
>> >       Also on daemon side, I a few times got while guest was running:
>> >         qemu-vmsr-helper: Failed to open /proc at 
>> > /proc/2496026/task/2496044
>> >         qemu-vmsr-helper: Requested TID not in peer PID: 2496026 2496044
>> >       though I can't reproduce it reliably  
>> 
>> This could happen only when a vCPU thread ID has changed between the 
>> call of a rdmsr throught the socket and the hepler that read the msr.
>> No idea how a vCPU can change TID or shutdown that fast.
>
> I guess it needs to be figured out to decide if it's safe to ignore (and not 
> print error)
> or if it's a genuine error/bug somewhere
>
>> >  3. when starting daemon not as root, it starts 'fine' but later on 
>> > complains
>> >       qemu-vmsr-helper: Failed to open MSR file at /dev/cpu/0/msr
>> >     perhaps it would be better to fail at start daemon if it doesn't have
>> >     access to necessary files.
>> >  
>> 
>> Right taking a note on that as well.
>> 
>> 
>> >  4. in case #3, guest also fails to start with errors:
>> >       qemu-system-x86_64: -accel kvm,rapl=on,rapl-helper-socket=/tmp/sock: 
>> > can't read any virtual msr
>> >       qemu-system-x86_64: -accel kvm,rapl=on,rapl-helper-socket=/tmp/sock: 
>> > kvm : error RAPL feature requirement not met
>> >      again line #2 is not useful and probably not needed (maybe make it 
>> > tracepoint)
>> >      and #1 is unhelpful - it would be better if it directed user to check 
>> > qemu-vmsr-helper
>> >  
>> 
>> I will try to see how to improve that part. 
>> Thanks for your valuable feedback.
>> 
>> >  5. does AMD have similar MSRs that we could use to make this feature 
>> > complete?
>> >  
>> 
>> Yes but the address are completely different. However, this in my ToDo 
>> list. First I need way more feedback like yours to move on extending 
>> this feature.
>
> If adding AMD's MSRs is not difficult, then I'd make it priority.
> This way users (and libvirt) won't have to deal with 2 different
> feature-sets and decide when to allow this to be turned on depending on host.
>

QEMU needs to know if it runs on Intel or AMD machine in order to choose 
which set of MSR it must read. I did not check how to achieve this at the 
moment but I will when I will work on that.

>> 
>> >  6. What happens to power accounting if host constantly migrates
>> >     vcpus between sockets, are values we are getting still 
>> > correct/meaningful?
>> >     Or do we need to pin vcpus to get 'accurate' values?
>> >  
>> 
>> It's taken into account during the ratio calculation which socket the 
>> vCPU has just been scheduled. But yes the value are more 'accurate' when 
>> the vCPU is pinned.
>
> in worst case VCPUs might be moved between sockets many times during
> sample window, can you explain how that is accounted for?
>

If one vCPU is moving socket during the sample period then it is 
detected and not taken into account.

That said, if your system is bouncing vCPU back and forth between socket 
then you will experience a lot of caches misses, cpu caches trashes, 
context switches, increase of memory latency (numa issues), etc. This 
will lead to performance degradation and VM performance being very poor. 
Then you should probably fix it. 

> Anyways, it would be better to have some numbers in doc that would
> clarify what kind of accuracy we are talking about (and example
> pinned vs unpinned), or whether unpinned case measures average
> temperature of patients in hospital and we should recommend
> to pin vcpus and everything else.
>

I totally understand that I can add more clarification in the 
documentation that might be obvious for some but not for other. Like 
isolating your VM properly will give better result. 

But I won't give any number. It doesn't make sens. Accuracy is not the 
goal of this feature, it never was and it never will. First of all 
because RAPL is not accurate for power monitoring. You want accuracy? 
Use a Power Metering device. 
You want a reproducible way to compare power energy between 
A and B in order to optimize your software ? Use can use RAPL and so 
this feature that shows good reproducible results.

> Also actual usecase examples for the feature should be mentioned
> in the doc. So users could figure out when they need to enable
> this feature (with attached accuracy numbers). Aka how this
> new feature is good for end users and what they can do with it.
>

Got it. More documentation, use case, examples. 
I will see what can be added to QEMU documentation.


>> >  7. do we have to have a dedicated thread for pooling data from daemon?
>> >
>> >     Can we fetch data from vcpu thread that have accessed msr
>> >     (with some caching and rate limiting access to the daemon)?
>> >  
>> 
>> This feature is revolving around a thread. Please look at the 
>> documentation is not already done:
>> 
>> https://www.qemu.org/docs/master/specs/rapl-msr.html#high-level-implementation
>> 
>> If we only fetch from vCPU thread, we won't have the consumption of the 
>> non-vcpu thread. They are taken into account in the total.
>
> one can collect the same data from vcpu thread as well,
> the bonus part is that we don't have an extra thread
> hanging around and doing work even if guest never asks
> for those MSRs.
>
> This also leads to a question, if we should account for
> not VCPU threads at all. Looking at real hardware, those
> MSRs return power usage of CPUs only, and they do not
> return consumption from auxiliary system components
> (io/memory/...). One can consider non VCPU threads in QEMU
> as auxiliary components, so we probably should not to
> account for them at all when modeling the same hw feature.
> (aka be consistent with what real hw does).
>
>> Thanks again for your feedback. 
>> 
>> Anthony
>> 
>> 
>> >> In this version (v6), I have attempted to address all the problems 
>> >> addressed by Daniel and Paolo during the last review. 
>> >> 
>> >> However, two open questions remains unanswered that would require the 
>> >> attention of a x86 maintainers: 
>> >> 
>> >> 1)Should I move from -kvm to -cpu the rapl feature ? [2]
>> >> 
>> >> 2)Should I already rename to "rapl_vmsr_*" in order to anticipate the 
>> >>   futur TMPI architecture ? [end of 3] 
>> >> 
>> >> Thank you again for your continued guidance. 
>> >> 
>> >> v5 -> v6
>> >> --------
>> >> - Better error consistency in qio_channel_get_peerpid()
>> >> - Memory leak g_strdup_printf/g_build_filename corrected
>> >> - Renaming several struct with "vmsr_*" for better namespace
>> >> - Renamed several struct with "guest_*" for better comprehension
>> >> - Optimization suggerate from Daniel
>> >> - Crash problem solved [4]
>> >> 
>> >> v4 -> v5
>> >> --------
>> >> 
>> >> - correct qio_channel_get_peerpid: return pid = -1 in case of error
>> >> - Vmsr_helper: compile only for x86
>> >> - Vmsr_helper: use qio_channel_read/write_all
>> >> - Vmsr_helper: abandon user/group
>> >> - Vmsr_energy.c: correct all error_report
>> >> - Vmsr thread: compute default socket path only once
>> >> - Vmsr thread: open socket only once
>> >> - Pass relevant QEMU CI
>> >> 
>> >> v3 -> v4
>> >> --------
>> >> 
>> >> - Correct memory leaks with AddressSanitizer  
>> >> - Add sanity check for QEMU and qemu-vmsr-helper for checking if host is 
>> >>   INTEL and if RAPL is activated.
>> >> - Rename poor variables naming for easier comprehension
>> >> - Move code that checks Host before creating the VMSR thread
>> >> - Get rid of libnuma: create function that read sysfs for reading the 
>> >>   Host topology instead
>> >> 
>> >> v2 -> v3
>> >> --------
>> >> 
>> >> - Move all memory allocations from Clib to Glib
>> >> - Compile on *BSD (working on Linux only)
>> >> - No more limitation on the virtual package: each vCPU that belongs to 
>> >>   the same virtual package is giving the same results like expected on 
>> >>   a real CPU.
>> >>   This has been tested topology like:
>> >>      -smp 4,sockets=2
>> >>      -smp 16,sockets=4,cores=2,threads=2
>> >> 
>> >> v1 -> v2
>> >> --------
>> >> 
>> >> - To overcome the CVE-2020-8694 a socket communication is created
>> >>   to a priviliged helper
>> >> - Add the priviliged helper (qemu-vmsr-helper)
>> >> - Add SO_PEERCRED in qio channel socket
>> >> 
>> >> RFC -> v1
>> >> ---------
>> >> 
>> >> - Add vmsr_* in front of all vmsr specific function
>> >> - Change malloc()/calloc()... with all glib equivalent
>> >> - Pre-allocate all dynamic memories when possible
>> >> - Add a Documentation of implementation, limitation and usage
>> >> 
>> >> Best regards,
>> >> Anthony
>> >> 
>> >> [1]: https://mail.gnu.org/archive/html/qemu-devel/2024-04/msg01570.html
>> >> [2]: https://mail.gnu.org/archive/html/qemu-devel/2024-04/msg03947.html
>> >> [3]: https://mail.gnu.org/archive/html/qemu-devel/2024-04/msg02350.html
>> >> [4]: https://mail.gnu.org/archive/html/qemu-devel/2024-04/msg02481.html
>> >> 
>> >> Anthony Harivel (3):
>> >>   qio: add support for SO_PEERCRED for socket channel
>> >>   tools: build qemu-vmsr-helper
>> >>   Add support for RAPL MSRs in KVM/Qemu
>> >> 
>> >>  accel/kvm/kvm-all.c                      |  27 ++
>> >>  contrib/systemd/qemu-vmsr-helper.service |  15 +
>> >>  contrib/systemd/qemu-vmsr-helper.socket  |   9 +
>> >>  docs/specs/index.rst                     |   1 +
>> >>  docs/specs/rapl-msr.rst                  | 155 +++++++
>> >>  docs/tools/index.rst                     |   1 +
>> >>  docs/tools/qemu-vmsr-helper.rst          |  89 ++++
>> >>  include/io/channel.h                     |  21 +
>> >>  include/sysemu/kvm_int.h                 |  32 ++
>> >>  io/channel-socket.c                      |  28 ++
>> >>  io/channel.c                             |  13 +
>> >>  meson.build                              |   7 +
>> >>  target/i386/cpu.h                        |   8 +
>> >>  target/i386/kvm/kvm.c                    | 431 +++++++++++++++++-
>> >>  target/i386/kvm/meson.build              |   1 +
>> >>  target/i386/kvm/vmsr_energy.c            | 337 ++++++++++++++
>> >>  target/i386/kvm/vmsr_energy.h            |  99 +++++
>> >>  tools/i386/qemu-vmsr-helper.c            | 530 +++++++++++++++++++++++
>> >>  tools/i386/rapl-msr-index.h              |  28 ++
>> >>  19 files changed, 1831 insertions(+), 1 deletion(-)
>> >>  create mode 100644 contrib/systemd/qemu-vmsr-helper.service
>> >>  create mode 100644 contrib/systemd/qemu-vmsr-helper.socket
>> >>  create mode 100644 docs/specs/rapl-msr.rst
>> >>  create mode 100644 docs/tools/qemu-vmsr-helper.rst
>> >>  create mode 100644 target/i386/kvm/vmsr_energy.c
>> >>  create mode 100644 target/i386/kvm/vmsr_energy.h
>> >>  create mode 100644 tools/i386/qemu-vmsr-helper.c
>> >>  create mode 100644 tools/i386/rapl-msr-index.h
>> >>   
>> 


Reply via email to