On Fri, 02 Mar 2007 17:03:10 -0500 Rik van Riel <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote: > > On Fri, 02 Mar 2007 16:19:19 -0500 > > Rik van Riel <[EMAIL PROTECTED]> wrote: > >> Bill Irwin wrote: > >>> On Fri, Mar 02, 2007 at 01:23:28PM -0500, Rik van Riel wrote: > >>>> With 32 CPUs diving into the page reclaim simultaneously, > >>>> each trying to scan a fraction of memory, this is disastrous > >>>> for performance. A 256GB system should be even worse. > >>> Thundering herds of a sort pounding the LRU locks from direct reclaim > >>> have set off the NMI oopser for users here. > >> Ditto here. > > > > Opterons? > > It's happened on IA64, too. Probably on Intel x86-64 as well. Opterons seem to be particularly prone to lock starvation where a cacheline gets captured in a single package for ever. > >> The main reason they end up pounding the LRU locks is the > >> swappiness heuristic. They scan too much before deciding > >> that it would be a good idea to actually swap something > >> out, and with 32 CPUs doing such scanning simultaneously... > > > > What kernel version? > > Customers are on the 2.6.9 based RHEL4 kernel, but I believe > we have reproduced the problem on 2.6.18 too during stress > tests. The prev_priority fixes were post-2.6.18 > I have no reason to believe we should stick our heads in the > sand and pretend it no longer exists on 2.6.21. I have no reason to believe anything. All I see is handwaviness, speculation and grand plans to rewrite vast amounts of stuff without even a testcase to demonstrate that said rewrite improved anything. None of this is going anywhere, is it? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/