lamb-j wrote: I've been working on testing this patch with an array of OpenCL benchmarks over the past month. We did some high-level regression testing with the following benchmarks:
BlackMagic Linpack_Dgemm babelstream-Double babelstream-Float chimex clfft clmem clpeak-Double-precision compute clpeak-Global memory bandwidth clpeak-Integer compute clpeak-Single-precision compute clpeak-Transfer bandwidth computeApps dgemm_linux fahbench flopscl ge-workspace ge_rdppenality indigo-benchmark lattice luxmark luxmark4 mixbench-ocl-ro ocltst shoc silentarmy viennacl With apps, we didn't see any significant regressions. I also did some in-depth testing with FAHbench and Chimex: **FAHBench** Current: Final score: 216.8422, 218.2792, 218.3647 Scaled score: 216.8422 (23558 atoms) App Runtime: 1m42.181s, 1m42.185s, 1m42.167s Compilation time: 3226 ms With this PR: Final score: 222.3547, 219.8134, 223.3722 Scaled score: 222.3547 (23558 atoms) App Runtime: 1m40.852s, 1m40.850s, 1m40.849s Compilation time: 1822 ms Between the two builds, the total runtime difference is ~1.3 seconds, and the difference in compilation is also ~1.3 seconds. So it does seem to support that we're only removing overhead with this PR, not introducing regressions. I also looked into the intermediate files. If we dump the two final .so files, they're nearly identical, with only a few lines differing. **Chimex** Current: Correlation matrices computation time: 2.3876s on GPU [Theoretical max: @13.9 TFLOPS, 1659.3 kHz; 83% efficiency] [Algorithm max: @13.9 TFLOPS, 1634.6 kHz; 84% efficiency] Compilation Time: 742 ms With this PR: Correlation matrices computation time: 1.9782s on GPU [Theoretical max: @13.9 TFLOPS, 1659.3 kHz; 100% efficiency] [Algorithm max: @13.9 TFLOPS, 1634.6 kHz; 101% efficiency] Compilation Time: 551 ms https://github.com/llvm/llvm-project/pull/85672 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits