Module Name: src Committed By: maxv Date: Sat Feb 11 14:11:25 UTC 2017
Modified Files: src/sys/arch/x86/include: cpu.h pmap.h src/sys/arch/x86/x86: cpu.c pmap.c src/sys/arch/xen/x86: cpu.c Log Message: Instead of using a global array with per-cpu indexes, embed the tmp VAs into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64}, because amd64 already has a direct map that is way faster than that. There are two major issues with the global array: maxcpus entries are allocated while it is unlikely that common i386 machines have so many cpus, and the base VA of these entries is not cache-line-aligned, which mostly guarantees cache-line-thrashing each time the VAs are entered. Now the number of tmp VAs allocated is proportionate to the number of CPUs attached (which therefore reduces memory consumption), and the base is properly aligned. On my 3-core AMD, the number of DC_refills_L2 events triggered when performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on average divided by two with this patch. Discussed on tech-kern a little. To generate a diff of this commit: cvs rdiff -u -r1.67 -r1.68 src/sys/arch/x86/include/cpu.h cvs rdiff -u -r1.61 -r1.62 src/sys/arch/x86/include/pmap.h cvs rdiff -u -r1.122 -r1.123 src/sys/arch/x86/x86/cpu.c cvs rdiff -u -r1.239 -r1.240 src/sys/arch/x86/x86/pmap.c cvs rdiff -u -r1.108 -r1.109 src/sys/arch/xen/x86/cpu.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.