This is the code in the RHEL3.8 kernel:
static int scan_active_list(struct zone_struct * zone, int age,
struct list_head * list, int count)
{
struct list_head *page_lru , *next;
struct page * page;
int over_rsslimit;
count = count * kscand_work_percent / 100;
/* Take the lock while messing with the list... */
lru_lock(zone);
while (count-- > 0 && !list_empty(list)) {
page = list_entry(list->prev, struct page, lru);
pte_chain_lock(page);
if (page_referenced(page, &over_rsslimit)
&& !over_rsslimit
&& check_mapping_inuse(page))
age_page_up_nolock(page, age);
else {
list_del(&page->lru);
list_add(&page->lru, list);
}
pte_chain_unlock(page);
}
lru_unlock(zone);
return 0;
}
My previous email shows examples of the number of pages in the list and
the scanning that happens.
david
Avi Kivity wrote:
> Andrea Arcangeli wrote:
>>
>> So I never found a relation to the symptom reported of VM kernel
>> threads going weird, with KVM optimal handling of kmap ptes.
>>
>
>
> The problem is this code:
>
> static int scan_active_list(struct zone_struct * zone, int age,
> struct list_head * list)
> {
> struct list_head *page_lru , *next;
> struct page * page;
> int over_rsslimit;
>
> /* Take the lock while messing with the list... */
> lru_lock(zone);
> list_for_each_safe(page_lru, next, list) {
> page = list_entry(page_lru, struct page, lru);
> pte_chain_lock(page);
> if (page_referenced(page, &over_rsslimit) && !over_rsslimit)
> age_page_up_nolock(page, age);
> pte_chain_unlock(page);
> }
> lru_unlock(zone);
> return 0;
> }
>
> If the pages in the list are in the same order as in the ptes (which is
> very likely), then we have the following access pattern
>
> - set up kmap to point at pte
> - test_and_clear_bit(pte)
> - kunmap
>
> From kvm's point of view this looks like
>
> - several accesses to set up the kmap
> - if these accesses trigger flooding, we will have to tear down the
> shadow for this page, only to set it up again soon
> - an access to the pte (emulted)
> - if this access _doesn't_ trigger flooding, we will have 512 unneeded
> emulations. The pte is worthless anyway since the accessed bit is clear
> (so we can't set up a shadow pte for it)
> - this bug was fixed
> - an access to tear down the kmap
>
> [btw, am I reading this right? the entire list is scanned each time?
>
> if you have 1G of active HIGHMEM, that's a quarter of a million pages,
> which would take at least a second no matter what we do. VMware can
> probably special-case kmaps, but we can't]
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html