> I also wonder whether the pre-existing loop over cpus > (in lpl order) > in disp_getwork on systems with many cpus is going to > access > a large number of cpu_t and effectively flush the > TLBs (as happened > in the mutex_vector_enter perf fix). I guess this is > a less frequent > operation and the cpu is idle anyway.
Even though the CPU is idle, it's still a big valid concern. Ideally, an idle() CPU/system should eventually reach a state where all the memory accesses performed by the idle() loop are only dealing with cache lines present in the local cache in the shared state. Otherwise, polling idle() CPUs will generate unwanted bus/memory controller traffic. And where most of the system is idle (and most of the kernel structures live in physical memory managed by a single memory controller), it adds up. :) At each level of locality, if disp_getwork() encounters another idle CPU, it breaks out, and goes up to the next level (the idea being that the found idle CPU is covering the following CPUs in the list at that level of locality.. So on a completely idle system, polling CPUs should only be looking at N other CPUs where N is the number of locality levels. This code has some cleanup in store for it, so soon it will be easier to follow. -Eric This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org