On Tue, Jul 10, 2007 at 11:20:43AM +0100, Mel Gorman wrote: > > > > Mel's page allocator work. Might merge this, but I'm still not hearing > > sufficiently convincing noises from a sufficient number of people over > > this. > > > > This is a long on-going story. It bounces between people who say it's not a > complete solution and everything should have the 100% ability to defragment > and the people on the other side that say it goes a long way to solving their > problem. I've cc'd some of the parties that have expressed any interest in > the last year.
And I guess some other people who want to see what prolbems there are and what can't be solved between order-0 allocations and reserve zones. > On a slightly more left of centre tact, these patches *may* help fsblock with > large blocks although I would like to hear Nick's confirming/denying this. > Currently if fsblock wants to work with large blocks, it uses a vmap to map > discontiguous pages so they are virtually contiguous for the filesystem. The > use of VMAP is never cheap, though how much of an overhead in this case is > unknown. If these patches were in place, fsblock could optimisically allocate > the higher-order page and use it without vmap if it succeeded. If it fails, > it would use vmap as a lower-performance-but-still-works fallback. This > may tie in better with what Christoph is doing with large blocks as well > as it may be a compromise solution between their proposals - I'm not 100% > sure so he's cc'd as well for comment. Yeah higher order allocations could definitely be helpful for this although I couldn't guess at the sort of impovements at this stage. And I mean if there was a simple choice between better (but still not perfect) support for higher order allocations or not, then of course you would take them. I am sure there are other places as well where they might makes life a bit easier or performance a bit better. But given the code involved, it is not just a simple choice, but a tradeoff. Perhaps I haven't seen or don't realise it, but I'm still not sure that this tradeoff is a good one. (just my opinion). > The patches have been reviewed heavily recently by Christoph and Andy has > looked through them as well. They've been tested for a long time in -mm so > I would expect they not regress functionality. I've maintained that having > the 100% ability to defragment will cost too much in terms of performance > and would be blocked by the fact that the device driver model would have to > be updated to never use physical addresses - a massive undertaking. I think > this approach is more pragmatic and working on making more types of memory > (like page tables) migratable is at least piecemeal as opposed to turning > everything on it's head. My comments about defragmentation of the kernel were not exactly what I believe is the right direction to go (it may be, but I'm rally not in a position to know without having seen or tried to implement it). But I do think that's what would really be needed in order to really support higher order allocations the same as order-0. I realise in your pragmatic approach, you are encouraging users to put fallbacks in place in case a higher order page cannot be allocated, but I don't think either higher order pagecache or higher order slubs have such fallbacks (fsblock or a combination of fsblock and higher order pagecache could have, but...). > > These are slub changes which are dependent on Mel's stuff, and I have a > > note > > here that there were reports of page allocation failures with these. > > What's > > up with that? > > > > These is where the > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch and > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch > patches should be. There were page allocation failure reports without these > patches but Nick felt they were not the correct solution and I tend to agree > with him on this matter. I haven't put a massive amount of thought into it > yet because without grouping pages by mobility, the question is pointless. Yeah I think that was a hack. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/