Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

Avi Kivity Sat, 31 May 2008 01:16:18 -0700

David S. Ahern wrote:

I haven't been able to reproduce this:
[EMAIL PROTECTED] root]# ps -elf | grep -E 'memuser|kscand'
1 S root 7 1 1 75 0 - 0 schedu 10:07 ?00:00:26 [kscand]0 S root 1464 1 1 75 0 - 196986 schedu 10:20 pts/000:00:21 ./memuser 768M 120 5 3000 S root 1465 1 0 75 0 - 98683 schedu 10:20 pts/000:00:10 ./memuser 384M 300 10 6000 S root 2148 1293 0 75 0 - 922 pipe_w 10:48 pts/000:00:00 grep -E memuser|kscand
The workload has been running for about half an hour, and kswapd cpu
usage doesn't seem significant.  This is a 2GB guest running with my
patch ported to kvm.git HEAD.  Guest is has 2G of memory.

I'm running on the per-page-pte-tracking branch, and I am still seeing it.

I doubt you want to sit and watch the screen for an hour, so install sysstat if 
not already, change the sample rate to 1 minute (/etc/cron.d/sysstat), let the 
server run for a few hours and then run 'sar -u'. You'll see something like 
this:

10:12:11 AM       LINUX RESTART

10:13:03 AM       CPU     %user     %nice   %system   %iowait     %idle
10:14:01 AM       all      0.08      0.00      2.08      0.35     97.49
10:15:03 AM       all      0.05      0.00      0.79      0.04     99.12
10:15:59 AM       all      0.15      0.00      1.52      0.06     98.27
10:17:01 AM       all      0.04      0.00      0.69      0.04     99.23
10:17:59 AM       all      0.01      0.00      0.39      0.00     99.60
10:18:59 AM       all      0.00      0.00      0.12      0.02     99.87
10:20:02 AM       all      0.18      0.00     14.62      0.09     85.10
10:21:01 AM       all      0.71      0.00     26.35      0.01     72.94
10:22:02 AM       all      0.67      0.00     10.61      0.00     88.72
10:22:59 AM       all      0.14      0.00      1.80      0.00     98.06
10:24:03 AM       all      0.13      0.00      0.50      0.00     99.37
10:24:59 AM       all      0.09      0.00     11.46      0.00     88.45
10:26:03 AM       all      0.16      0.00      0.69      0.03     99.12
10:26:59 AM       all      0.14      0.00     10.01      0.02     89.83
10:28:03 AM       all      0.57      0.00      2.20      0.03     97.20
Average:          all      0.21      0.00      5.55      0.05     94.20


every one of those jumps in %system time directly correlates to kscand activity. 
Without the memuser programs running the guest %system time is <1%. The point 
of this silly memuser program is just to use high memory -- let it age, then make 
it active again, sit idle, repeat. If you run kvm_stat with -l in the host you'll 
see the jump in pte writes/updates. An intern here added a timestamp to the 
kvm_stat output for me which helps to directly correlate guest/host data.


I also ran my real guest on the branch. Performance at boot through the first 
15 minutes was much better, but I'm still seeing recurring hits every 5 minutes 
when kscand kicks in. Here's the data from the guest for the first one which 
happened after 15 minutes of uptime:

active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct 24845, dj 59

active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct 40868, dj 
103

active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct 45805, dj 1212

We touched 90,000 ptes in 12 seconds. That's 8,000 ptes per second.Yet we see 180,000 page faults per second in the trace.

Oh! Only 45K pages were direct, so the other 45K were shared, withperhaps many ptes. We shoud count ptes, not pages.

Can you modify page_referenced() to count the numbers of ptes mapped (1for direct pages, nr_chains for indirect pages) and print the totaldeltas in active_anon_scan?

The kvm_stat data for this time period is attached due to line lengths.


Also, I forgot to mention this before, but there is a bug in the kscand code in 
the RHEL3U8 kernel. When it scans the cache list it uses the count from the 
anonymous list:

            if (need_active_cache_scan(zone)) {
                for (age = MAX_AGE-1; age >= 0; age--)  {
                    scan_active_list(zone, age,
                        &zone->active_cache_list[age],
                        zone->active_anon_count[age]);
                              ^^^^^^^^^^^^^^^^^
                    if (current->need_resched)
                        schedule();
                }
            }

When the anonymous count is higher it is scanning the cache list repeatedly. An 
example of that was captured here:

active_cache_scan: HighMem, age 7, count[age] 222 -> 179, count anon 111967, 
direct 626, dj 3

count anon is active_anon_count[age] which at this moment was 111,967. There 
were only 222 entries in the cache list, but the count value passed to 
scan_active_list was 111,967. When the cache list has a lot of direct pages, 
that causes a larger hit on kvm than needed. That said, I have to live with the 
bug in the guest.


For debugging, can you fix it?  It certainly has a large impact.

Perhaps it is fixed in an update kernel. There's a 2.4.21-50.EL in thecentos 3.8 update repos.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

Reply via email to