gtbercea added a comment. In https://reviews.llvm.org/D52434#1248975, @Hahnfeld wrote:
> In https://reviews.llvm.org/D52434#1248974, @gtbercea wrote: > > > One big problem your code has is that the trip count is incredibly small, > > especially for STREAM and especially on GPUs. You need a much larger loop > > size otherwise the timings will be dominated by OpenMP setups costs. > > > Sure, I'm not that dump. The real code has larger loops, this was just for > demonstration purposes. I don't expect the register count to change based on > loop size - is that too optimistic? I checked the different combinations of schedules and the current default is the fastest compared to previous defaults. The old defaults are about 10x slower than the current set of defaults (dist_schedule(static, <num threads>) and schedule(static, 1)). The register allocation looks strange but it's just a consequence of using different schedules. You report a slow down which I am not able to reproduce actually. Do you use any additional clauses not present in your previous post? Repository: rC Clang https://reviews.llvm.org/D52434 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits