Andrea Arcangeli wrote:
On Thu, May 29, 2008 at 06:16:55PM +0300, Avi Kivity wrote:
Yes. We need a fault in order to set the guest accessed bit.
So what I'm missing now is how the spte corresponding to the user pte
that is under test_and_clear to clear the accessed bit, will not the
zapped immediately. If we don't zap it immediately, how do we set the
accessed bit again on the user pte, when the user program returned
running and used that shadow pte to access the program data after the
kscand pass?
The spte is zapped unconditionally in kvm_mmu_pte_write(), and not
re-established in mmu_pte_write_new_pte() due to the missing accessed bit.
The question is whether to tear down the shadow page it is contained in,
or not.
Or am I missing something?
Unshadowing a page is expensive, both in immediate cost, and in future cost
of reshadowing the page and taking faults. It's worthwhile to be sure the
guest really doesn't want it as a page table.
Ok that makes sense, but can we defer the unshadowing while still
emulating the accessed bit correctly on the user pte?
We do, unless there's a bad bug somewhere.
If the pages are not scanned linearly, then unshadowing may not help.
It should help the second time kscand runs, for the user ptes that
aren't shadowed anymore, the second pass won't require any emulation
for test_and_bit because the spte of the fixmap area will be
read-write. The bug that passes the anonymous pages number instead of
the cache number will lead to many more test_and_clear than needed,
and not all user ptes may be used in between two different kscand passes.
We still need 3 emulations per pte to set the fixmap entry. Unshadowing
saves one emulation on the pte itself.
Let's see 1G of highmem is 250,000 pages, mapped by 500 pages tables.
There are likely 1500 ptes in highmem. (ram isn't the most important factor)
I use 'pte' in the Intel manual sense (page table entry), not the Linux
sense (page table).
I mentioned these numbers to see the worst case behavior.
Non-highmem:
- with unshadow: O(500) accesses to unshadow the page tables, then
native speed
- without unshadow: O(250000) accesses to modify the ptes
Highmem:
- with unshadow: O(250000) accesses to update the fixmap entry
- with unshadow: O(250000) accesses to update the fixmap entry and to
modify the ptes
Well, then after 4000 scans we ought to have unshadowed everything. So I
guess per-page-pte-history is broken, can't explain it otherwise.
Yes, we should have unshadowed all user ptes after 4000 scans and then
the test_and_clear shouldn't require any more emulation, there will be
only 3 emulations for each kmap_atomic/kunmap_atomic.
So we save 25%. It's still bad even if everything is working correctly.
I think it should be clear that by now, we're trying to be
bug-compatile like the host here, and optimizing for 2.6 kmaps.
Don't understand.
I'm guessing esx gets its good performance by special-casing something.
For example, they can keep the fixmap page never shadowed, always
emulate accesses through the fixmap page, and recompile instructions
that go through fixmap to issue a hypercall.
--
Do not meddle in the internals of kernels, for they are subtle and quick to
panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html