On Mon, Jul 27, 2020, at 9:24 PM, Adam Carter wrote: > > Compare realtime it to measured CPU time. If one realtime second is > > shorter than a > > CPU second then you know the host is pausing your VM. There are other ways > > to > > check, but this should always work if you can contact an asynchronous time > > standard. > > You may need to average the time over tens of seconds or a minute. > > > > This method will allow you to figure out that AWS spot instances are > > oversubscribed ~1.5x. > > > > Nice. FWIW the guest is running NTP. > > So should I run something like: date ; time <some command that runs at > 100%CPU for a minute> ; date ?
No, date will pull from your RTC, which is usually kept up to date with an asynchronous counter. First check GNU top(1) and look in the %Cpu line for "st." That is % CPU time stolen. If it is nonzero then the guest time's accounting is probably working. It's not typical for the hypervisor to hide this information. It's really important for load balancing. If that doesn't work we're going to have to write some C. Look at clock_gettime(3): https://linux.die.net/man/3/clock_gettime. The clocks are performance counters. Usually their only guarantee is that they go up. On some platforms you may be able to see a difference between CLOCK_REALTIME and CLOCK_MONOTONIC. On most platforms however, CLOCK_MONOTONIC is clocked from the CPU timebase and continues to increment when your program is not running. On Windows the API exposes the per-core clocks as well. So to get around this, you need to know the frequency of the processor and how long it takes to execute specific instructions. % time ./stealcheck real 0.680168s expected 0.625681s ./stealcheck 0.69s user 0.00s system 98% cpu 0.698 total As commented below, I didn't have time to find the exact cycle count for a busy loop. But six is familiar and these times line up with what `time` gives. The other issue is I haven't implemented CPU pinning nor have I fixed the frequency. If possible do those, otherwise you can still infer an accurate steal time it just requires statistics. This will be good enough for a yes/no answer. (I.e. if you get a noticeable discrepancy buy more hardware.) https://github.com/R030t1/stealcheck g++ -std=gnu++2a -Wall -pedantic \ stealcheck.cc -o stealcheck #include <stdint.h> #include <stdlib.h> #include <stdio.h> #include <time.h> #include <string> #include <regex> #include <iostream> #include <fstream> using namespace std; uint64_t cpufreq(); int main(int argc, char *argv[]) { // If you have a newer processor you can request // cpuid level 0x16. For this impl. libpcre is // likely faster. uint64_t cf = cpufreq(), // Six is familiar but likely not right. cycles_per_loop = 6; struct timespec start = { 0 }; clock_gettime(CLOCK_REALTIME, &start); // Confirm the cycle count of these instructions for // accurate results and/or implement loop with asm. uint64_t count = 0x10000000, orig = 0x10000000; while (count--); struct timespec end = { 0 }; clock_gettime(CLOCK_REALTIME, &end); // Calculate delta. end.tv_sec -= start.tv_sec; end.tv_nsec -= start.tv_nsec; double real = (end.tv_sec * 1.0) + (end.tv_nsec / 1000000000.0); double expected = (1.0 / cf) * orig * cycles_per_loop; printf("real\t\t%lfs\n", real); printf("expected\t%lfs\n", expected); return 0; } uint64_t cpufreq() { uint64_t res = 0; regex pattern("^cpu MHz.*?([\\d.]+)"); smatch glean; string line; ifstream cpuinf("/proc/cpuinfo"); while (getline(cpuinf, line)) { if (!regex_search(line, glean, pattern)) continue; // This effectively returns the last one, but I didn't // want to add CPU pinning etc. They are typically close // together. res = stod(glean[1].str()) * 1000000; } return res; }