> I also wonder whether the pre-existing loop over cpus
> (in lpl order)
> in disp_getwork on systems with many cpus is going to
> access
> a large number of cpu_t and effectively flush the
> TLBs (as happened
> in the mutex_vector_enter perf fix).  I guess this is
> a less frequent
> operation and the cpu is idle anyway.

Even though the CPU is idle, it's still a big valid concern.
Ideally, an idle() CPU/system should eventually reach a state
where all the memory accesses performed by the idle() loop
are only dealing with cache lines present in the local cache in
the shared state. Otherwise, polling idle() CPUs will generate
unwanted bus/memory controller traffic. And where most of
the system is idle (and most of the kernel structures live in physical
memory managed by a single memory controller), it adds up. :)

At each level of locality, if disp_getwork() encounters another idle CPU,
it breaks out, and goes up to the next level (the idea being that the found idle
CPU is covering the following CPUs in the list at that level of locality..

So on a completely idle system, polling CPUs should only be looking at N other 
CPUs where N is the number of locality levels.

This code has some cleanup in store for it, so soon it will be easier to follow.

-Eric
 
 
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to