:And don't forget that with VHPT you'll be getting nested TLB faults quite
:frequently in a sparsely-populated page table (think shared libraries).
:
:Efficiency-wise, Kevin has shown that you only need a fairly small VPHT, and
:it is global, so you ammortise the cost across all running tasks. Further,
:you can easily share a GPT or LPCtrie subtrees, at which stage the whole
:memory-wastage argument goes completely out of the window (I'm currently
:writing a microkernel that is intended to demonstrate just that on UltraSPARC
:which has an MMU vaguely resembling that of IA-64.). Besides, doesn't Linux
:duplicate the structure anyway even when it uses a hardware-walked page table?
Linux uses a 'machine-independant' two or three level page table which
is then 'translated' to a machine dependant version. However, for IA32
and any architecture that supports it, they attempt to overlay the
machine dependant and machine independant versions so they wind up
having only one page table. This pretty much locks linux into a
standard hierarchical page table design at least insofar as minimizing
memory overhead goes. They can eat it support other architectures, but
it doesn't allow them to get rid of the machine independant version of
the page table.
Linux also stores persistent information in their machine independant
page tables. They aren't throw-away like FreeBSD's are. This will give
us a huge advantage when we do the IA64 port.
:> your proposal is - use VHPT as a large in memory TLB and use GPT as
:> operating system's primary page table.
:
:Precisely.
:
:> Doesn't that involve duplication of information in memory, especially if the
:> hash table is big ?
:
:No, not significantly, for two reasons: first, you don't need a huge VPHT --
:512KB is more than enough. Also, VPHT becomes a cache for the actual page
:table. It's been empirically demonstrated that 64 bit (esp. sparse 64 bit)
:page tables really need such a cache (software TLB) anyway. And it's the main
In general I like the idea of using a VHPT as an STLB (are we having
fun with terminology yet?). It should be possible to do even better
by optimizing the TLB entries into variable-length pages. We would
have to rewrite the page allocation code to make it practical, but it
could be done. Many of the pages we are talking about here are from
shared libraries which generally wind up staying permantly resident
in memory anyway, which means that the overhead of making them
physically contiguous over time is low. This makes the optimization
possible.
What I would truely love to do would be to get away with not using a GPT
at all and instead doing a vm_map_lookup_entry()/vm_page_lookup()
(essentially taking a vm_fault), then optimize the vm_map_entry
structural hierarchy to look more like a GPT rather then the linear
list it currently is. When coupled with an STLB, especially one that
can be optimized, I think performance would be extremely good.
:way Intel planned VPHT to be used in the first place. The performance
:improvement tends to be significant (look at Kevin's PhD that I've posed
:before.) Besides, the amount of space saved due to a smarter page table data
:structure more than compensates for the additional memory anyway.
:
:> > the only reason why IA-64 walks VPHT in hardware *at all* is to minimize
:> > the impact on the pipeline and improve ILP:
:>
:> I think that's an important reason. A software only TLB miss handler
:> would be inferior to a VHPT based solution on IA-64, IMO.
:
:It's the only justification Rumi Zahir (head of the IA-64 team) gave me when I
:was complaining about it. (as in: ``why bother? 64 bit page tables are an
:open problem and no other 64 bit platform I know of provides a hardware page
:table walk''. BTW, does anoone know if HP-PA and IBM 64bit PPC implement a
:hardware PT walk?
:
:Pat.
-Matt
Matthew Dillon
<[EMAIL PROTECTED]>
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message