Benjamin Herrenschmidt <b...@kernel.crashing.org> writes: > On Sun, 2013-06-16 at 13:37 +1000, Benjamin Herrenschmidt wrote: >> On Sun, 2013-06-16 at 12:00 +1000, Benjamin Herrenschmidt wrote: >> > So at this point, hash_page might *still* see the old pmd. Unless I >> > missed something, you did nothing that will prevent that (the only way >> > to lock against hash_page is really an IPI & wait or to take the PTE's >> > busy and make them !present or something). So as far as I can tell, >> > a concurrent hash_page can still sneak into the hash some "small" >> > entries after you have supposedly flushed them. >> >> Note that the _PAGE_PRESENT bit is removed eventually ... but much >> later, in __collapse_huge_page_copy() which will also flush the hash, so >> at least we will remove a stale hash entry that would have been added by >> the race above I suppose... but: >> >> - _PAGE_ACCESSED can still potentially be set after it was supposed to >> be stable >> >> - The clearing happens *after* copy_user_highpage(), ie, unless I >> missed something here, we potentially still have something writing to >> the 4k page while it's being copied, which is BAD. >> >> Now, let me know if I did miss something here :-) > > An additional issue is that this all collides a bit with Alexey's work > to support TCEs in real mode in KVM, which is necessary to have "usable" > PCI pass-through. > > Look at patches http://patchwork.ozlabs.org/patch/248920/ and followup, > he basically walks the page tables here in a slightly different way than > Paul does in H_ENTER. It's more like gup_fast. It will need to handle > concurrent split/collapse etc... as well which it doesn't right now. > > I'm considering merging Alexei stuff first (provided I don't find major > problems with it), then you can provide a new THP series on top of it. > > While at it, also fix: > > - Some of your patches are bug fixes (like the one about subpage > protection). They need to be either merged in the main patch or put > before the patch that enables THP.
We can move them before. One of the reason I would like to keep them as seperate patches is to document the changes better in commit messages. > > - I haven't completely yet considered the impact of the demotion of > segments, but neither do you :-) IE. Under some circumstances, we can > demote entire segments from 64K HW pages to 4K HW pages in the SLB. For > example if a driver (such as HCA) sets the 4K_PFN bit in a PTE, this > will happen at hashing time. I don't think your code deals with that at > all, am I correct ? It *might* be that the right approach with those is: > But will that by anonymous memory ? ie, will we find them suitable for THP allocation ? > * If you find a THP in hash_page and the segment size is 4k, fault > > * In do_page_fault, re-check for that condition (or maybe we can make > hash_page return a specific bit that gets ORed into the error_code into > do_page_fault ?) and split huge pages there. > > But that's just an idea off the top of my mind, there might be a better > way. Of course this needs to be tested. > > BTW. For the subpage protection, similarily, you need to make sure you > properly map the entire segment as "no THP", not just the range > passed-in by the user. Can you explain that more, why should the entire segment be marked no THP ? The segment can work with 4K base page size and we still be able to allocate a hugepage in that segment. -aneesh _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev