Hi All, I discovered a strange behavior of SPEC CPU2006 436.cactusADM benchmark. It’s performance depends on the length of $LD_LIBRARY_PATH variable. The benchmark was compiled with "-O3 -funroll-loops -ffast-math -march=core-avx2 -mtune=core-avx2" using gcc version 4.8.0 20130218. I used Intel Software Development Emulator 5.38.0 to run AVX2 code:
$ export LD_LIBRARY_PATH='' $ /sde-bdw-external-5.38.0-2013-01-03-lin/sde -icount -- ./cactusADM benchADM.par > benchADM.out 2> benchADM.err ICOUNT: 2593690591 $ export LD_LIBRARY_PATH='aaaaaaaaaaaaaaaaaaaa' $ /sde-bdw-external-5.38.0-2013-01-03-lin/sde -icount -- ./cactusADM benchADM.par > benchADM.out 2> benchADM.err ICOUNT: 2100709724 In the second case 23% less instructions are executed! The difference is caused by the code, that checks the alignment of some pointer in the benchmark. It seems that the environment variables (including $LD_LIBRARY_PATH) are placed in the app’s memory, this leads to different alignment, therefore to various execution paths. Is this a known issue? Is there anything the GCC can do to make the alignment more consistent? Thanks, Ilya