On 23 September 2015 16:05:02 GMT+10:00, Michael Neuling <mi...@neuling.org> wrote: >The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to >determine the loaded address of the VDSO. The current version of these >attempt to use the special bcl variant which avoids pushing to the >link stack. > >Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the >current code results in link stack corruption and the resulting >performance degradation (due to branch mis-prediction). > >This patch moves us to bcl+4 by moving __kernel_datapage_offset >out of __get_datapage(). > >With this patch, running the below benchmark we get a bump in >performance on POWER8 for gettimeofday() (which uses >__get_datapage()). > >64bit gets ~4% improvement: > Without patch: > # ./tb > time = 0.180321 > With patch: > # ./tb > time = 0.187408 > >32bit gets ~9% improvement: > Without patch: > # ./tb > time = 0.276551 > With patch: > # ./tb > time = 0.252767 > >Testcase tb.c (stolen from Anton) > /* gcc -O2 tb.c -o tb */ > #include <sys/time.h> > #include <stdio.h> > > int main() > { > int i; > > struct timeval tv_start, tv_end; > > gettimeofday(&tv_start, NULL); > > for(i = 0; i < 10000000; i++) { > gettimeofday(&tv_end, NULL); > } > > printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + >(tv_end.tv_usec - tv_start.tv_usec) * 1e-6); > > return 0; > }
You know where test cases are supposed to go. I know it's not a pass/fail test, but it's still useful. If it's in the tree it will get run as part of automated test runs and we will have a record of the result over time. cheers -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev