On 2013/06/13 20:01, Remy Nonnenmacher wrote:

On 06/13/13 13:32, Mark Felder wrote:
On Wed, 12 Jun 2013 17:58:49 -0500, David O'Brien <obr...@freebsd.org>
wrote:

We found FreeBSD 8.4 to perform better than FreeBSD 9.1, and Linux
considerably better than both on the same machine.

http://svnweb.freebsd.org/base?view=revision&revision=241246

The above link is likely why 8.4 is better than 9.1 on the same machine.

We've tried various things and haven't been able to explain why FreeBSD
isn't scaling on the new hardware.  Nor why it performs so much worse
than FreeBSD on the older "M2" machines.

The CPUs between those machines are quite different. I'm sure we're
looking at different cache sizes, different behavior for the
hyperthreading, etc. I'm sure others would be greatly interested in you
providing the same benchmark results for a recent snapshot of HEAD as
well.
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to
"freebsd-performance-unsubscr...@freebsd.org"

We had same problem on 4x12 cores (AMD) machines. After investigating
using hwpmc, it appears that performance was killed by a scheduler
function trying to find "least used cpu" that unfortunately works on
contended structures (ie: lots a cores are fighting to get works). A
solution was found by using artificially long queue of stuck process
(steal_thresh bumped to over 8) and by cpu affinity crafting.

Was a year ago and from my memory. I guess you may give a try to see if
it helps.

Disregard is a scheduler specialist contradicts.

Thanks.


AMD's cache is very different than Intel, AFAIK eariler than Bulldozer, AMD's L3 is exclusive cache, util Bulldozer, AMD describes the L3 cache as a “non-inclusive victim cache”, it is still different than Intel which is inclusive.

"- In sched_pickcpu() change general logic of CPU selection. First
look for idle CPU, sharing last level cache with previously used one,
skipping SMT CPU groups. If none found, search all CPUs for the least loaded
one, where the thread with its priority can run now. If none found, search
just for the least loaded CPU."

For exclusive cache, the L3 has second-hand data, not hot data, when a thread is migrated, will have negative effect, its hot data is lost.
I'd prefer to search idle CPU from L2, then L3.


_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-performance-unsubscr...@freebsd.org"

Reply via email to