Hey Mel, > > Hi, > > > > Here is an attempt to pick few interesting patches from tip/numa/core. > > For the initial stuff, I have selected the last_nidpid (which was > > last_cpupid + Make gcc to not reread the page tables patches). > > > > Here is the performance results of running autonumabenchmark on a 8 node 64 > > core system. Each of these tests were run for 5 iterations. > > > > > > KernelVersion: v3.9 > > Testcase: Min Max Avg > > numa01: 1784.16 1864.15 1800.16 > > numa01_THREAD_ALLOC: 293.75 315.35 311.03 > > numa02: 32.07 32.72 32.59 > > numa02_SMT: 39.27 39.79 39.69 > > > > KernelVersion: v3.9 + last_nidpid + gcc: no reread patches > > Testcase: Min Max Avg %Change > > numa01: 1774.66 1870.75 1851.53 -2.77% > > numa01_THREAD_ALLOC: 275.18 279.47 276.04 12.68% > > numa02: 32.75 34.64 33.13 -1.63% > > numa02_SMT: 32.00 36.65 32.93 20.53% > > > > We do see some degradation in numa01 and numa02 cases. The degradation is > > mostly because of the last_nidpid patch. However the last_nidpid helps > > thread_alloc and smt cases and forms the basis for few more interesting > > ideas in the tip/numa/core. > > > > I did not have time unfortunately to review the patches properly but ran > some of the same tests that were used for numa balancing originally. >
Okay. > One of the threads segfaulted when running specjbb in single JVM mode with > the patches applied so there is either a stability issue in there or it > makes an existing problem with migration easier to hit by virtue of the > fact it's migrating more agressively. > I tried reproducing this as in ran 3 vms and ran a single node jvm specjbb on all of these 3 vms and the host running the kernel with the kernel with my patches. However I didnt hit this issue even after couple of iterations. (I have tried all options like (no)ksm/(no)thp) Can you tell me how different is your setup? I had seen something similar to what you had pointed when I was benchmarking last year. > Specjbb in multi-JVM somed some performance improvements with a 4% > improvement at the peak but the results for many thread instances were a > lot more variable with the patches applied. System CPU time increased by > 16% and the number of pages migrated was increased by 18%. > Okay. > NAS-MPI showed both performance gains and losses but again the system > CPU time was increased by 9.1% and 30% more pages were migrated with the > patches applied. > > For autonuma, the system CPU time is reduced by 40% for numa01 *but* it > increased by 70%, 34% and 9% for NUMA01_THEADLOCAL, NUMA02 and > NUMA02_SMT respectively and 45% more pages were migrated overall. > > So while there are some performance improvements, they are not > universal, tehre is at least one stability issue and I'm not keen on the > large increase in system CPU cost and number of pages being migrated as > a result of the patch when there is no co-operation with the scheduler > to make processes a bit stickier on a node once memory has been migrated > locally. > Okay, I will try to see if I can further tweak the patches to reduce the cpu consumption and reduce page migrations. > -- > Mel Gorman > SUSE Labs > -- Thanks and Regards Srikar Dronamraju -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/