On Wed, Jul 18, 2007 at 06:32:22AM -0700, William Lee Irwin III wrote: > Actually I'd worked on what was called MPSS (Multiple Page Size Support) > before I ever started on pgcl. Some large portion of the pgcl proposal > as I presented it internally was to reduce the order of large page > allocations and provide a promotion and demotion mechanism enabling > different processes to have different sized translations for the same > large page, and hence no out-of-context pagetable/TLB updates during > promotion and demotion, essentially by making the TLB translation to > page relation M:N. ISTR describing this in a KS presentation for which > IIRC you were present. But that's neither here nor there.
Well the whole difference between you back then and SGI now, is that your stuff wasn't being pushed to be merged very hard (it was proposed but IIRC more as research topic, like the large PAGE_SIZE also fallen into that same research area). See now the emails from SGI fs folks about variable order page size, they want it merged badly instead. My whole point is that the single moment the variable order page size isn't pure research anymore like MPSS, the CONFIG_PAGE_SHIFT isn't research anymore either, like the tail packing in pagecache with kmalloc also isn't research anymore. About the fs deciding the size of the pagecache granularity I totally dislike that design, there's no reason why the fs should control that, whatever clever algorithm deciding which pagecache granularity to use should be outside fs/xfs. I like the pagecache layer to be in charge of everything. The fs should stay a simple remapper between logical inode offset to physical disk offset. That can take into account raid, or other stuff, that's still a logical->raid->physical translation, but the highelevel "brainer" intellgigence of deciding which granularity the pagecache should use, would better be in the pagecache/vfs layer to benefit everyone. And anyway I prefer to keep the PAGE_SIZE big, and allocate fragments for small files with kmalloc down to 32 bytes granularity, and memcpy them away if you mmap the file. After the first time we move from kmalloc fragment to real PAGE_SIZE pagecache, we add a bitflag to the inode somewhere to be sure we never use the kmalloc fragment anymore later even if the page is evicted from pagecache (inodes may well live longer than pagecache so a bitflag is going to be worth it). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/