gtbercea added a comment.

In https://reviews.llvm.org/D52434#1248975, @Hahnfeld wrote:

> In https://reviews.llvm.org/D52434#1248974, @gtbercea wrote:
>
> > One big problem your code has is that the trip count is incredibly small, 
> > especially for STREAM and especially on GPUs. You need a much larger loop 
> > size otherwise the timings will be dominated by OpenMP setups costs.
>
>
> Sure, I'm not that dump. The real code has larger loops, this was just for 
> demonstration purposes. I don't expect the register count to change based on 
> loop size - is that too optimistic?


I checked the different combinations of schedules and the current default is 
the fastest compared to previous defaults. The old defaults are about 10x 
slower than the current set of defaults (dist_schedule(static, <num threads>) 
and schedule(static, 1)). The register allocation looks strange but it's just a 
consequence of using different schedules.

You report a slow down which I am not able to reproduce actually. Do you use 
any additional clauses not present in your previous post?


Repository:
  rC Clang

https://reviews.llvm.org/D52434



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to