David S. Ahern wrote:
I haven't been able to reproduce this:
[EMAIL PROTECTED] root]# ps -elf | grep -E 'memuser|kscand'
1 S root 7 1 1 75 0 - 0 schedu 10:07 ?
00:00:26 [kscand]
0 S root 1464 1 1 75 0 - 196986 schedu 10:20 pts/0
00:00:21 ./memuser 768M 120 5 300
0 S root 1465 1 0 75 0 - 98683 schedu 10:20 pts/0
00:00:10 ./memuser 384M 300 10 600
0 S root 2148 1293 0 75 0 - 922 pipe_w 10:48 pts/0
00:00:00 grep -E memuser|kscand
The workload has been running for about half an hour, and kswapd cpu
usage doesn't seem significant. This is a 2GB guest running with my
patch ported to kvm.git HEAD. Guest is has 2G of memory.
I'm running on the per-page-pte-tracking branch, and I am still seeing it.
I doubt you want to sit and watch the screen for an hour, so install sysstat if
not already, change the sample rate to 1 minute (/etc/cron.d/sysstat), let the
server run for a few hours and then run 'sar -u'. You'll see something like
this:
10:12:11 AM LINUX RESTART
10:13:03 AM CPU %user %nice %system %iowait %idle
10:14:01 AM all 0.08 0.00 2.08 0.35 97.49
10:15:03 AM all 0.05 0.00 0.79 0.04 99.12
10:15:59 AM all 0.15 0.00 1.52 0.06 98.27
10:17:01 AM all 0.04 0.00 0.69 0.04 99.23
10:17:59 AM all 0.01 0.00 0.39 0.00 99.60
10:18:59 AM all 0.00 0.00 0.12 0.02 99.87
10:20:02 AM all 0.18 0.00 14.62 0.09 85.10
10:21:01 AM all 0.71 0.00 26.35 0.01 72.94
10:22:02 AM all 0.67 0.00 10.61 0.00 88.72
10:22:59 AM all 0.14 0.00 1.80 0.00 98.06
10:24:03 AM all 0.13 0.00 0.50 0.00 99.37
10:24:59 AM all 0.09 0.00 11.46 0.00 88.45
10:26:03 AM all 0.16 0.00 0.69 0.03 99.12
10:26:59 AM all 0.14 0.00 10.01 0.02 89.83
10:28:03 AM all 0.57 0.00 2.20 0.03 97.20
Average: all 0.21 0.00 5.55 0.05 94.20
every one of those jumps in %system time directly correlates to kscand activity.
Without the memuser programs running the guest %system time is <1%. The point
of this silly memuser program is just to use high memory -- let it age, then make
it active again, sit idle, repeat. If you run kvm_stat with -l in the host you'll
see the jump in pte writes/updates. An intern here added a timestamp to the
kvm_stat output for me which helps to directly correlate guest/host data.
I also ran my real guest on the branch. Performance at boot through the first
15 minutes was much better, but I'm still seeing recurring hits every 5 minutes
when kscand kicks in. Here's the data from the guest for the first one which
happened after 15 minutes of uptime:
active_anon_scan: HighMem, age 11, count[age] 24886 -> 5796, direct 24845, dj 59
active_anon_scan: HighMem, age 7, count[age] 47772 -> 21289, direct 40868, dj
103
active_anon_scan: HighMem, age 3, count[age] 91007 -> 329, direct 45805, dj 1212
We touched 90,000 ptes in 12 seconds. That's 8,000 ptes per second.
Yet we see 180,000 page faults per second in the trace.
Oh! Only 45K pages were direct, so the other 45K were shared, with
perhaps many ptes. We shoud count ptes, not pages.
Can you modify page_referenced() to count the numbers of ptes mapped (1
for direct pages, nr_chains for indirect pages) and print the total
deltas in active_anon_scan?
The kvm_stat data for this time period is attached due to line lengths.
Also, I forgot to mention this before, but there is a bug in the kscand code in
the RHEL3U8 kernel. When it scans the cache list it uses the count from the
anonymous list:
if (need_active_cache_scan(zone)) {
for (age = MAX_AGE-1; age >= 0; age--) {
scan_active_list(zone, age,
&zone->active_cache_list[age],
zone->active_anon_count[age]);
^^^^^^^^^^^^^^^^^
if (current->need_resched)
schedule();
}
}
When the anonymous count is higher it is scanning the cache list repeatedly. An
example of that was captured here:
active_cache_scan: HighMem, age 7, count[age] 222 -> 179, count anon 111967,
direct 626, dj 3
count anon is active_anon_count[age] which at this moment was 111,967. There
were only 222 entries in the cache list, but the count value passed to
scan_active_list was 111,967. When the cache list has a lot of direct pages,
that causes a larger hit on kvm than needed. That said, I have to live with the
bug in the guest.
For debugging, can you fix it? It certainly has a large impact.
Perhaps it is fixed in an update kernel. There's a 2.4.21-50.EL in the
centos 3.8 update repos.
--
Do not meddle in the internals of kernels, for they are subtle and quick to
panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html