gtbercea added a comment. In https://reviews.llvm.org/D52434#1249186, @Hahnfeld wrote:
> In https://reviews.llvm.org/D52434#1249102, @gtbercea wrote: > > > You report a slow down which I am not able to reproduce actually. Do you > > use any additional clauses not present in your previous post? > > > No, only `dist_schedule(static)` which is faster. Tested on a `Tesla P100` > with today's trunk version: > > | `#pragma omp target teams distribute parallel for` (new defaults) | > 190 - 250 GB/s | > | adding clauses for old defaults: `schedule(static) dist_schedule(static)` | > 30 - 50 GB/s | > | same directive with only `dist_schedule(static)` added (fewer registers) | > 320 - 400 GB/s | > | Which loop size you're using ? What runtime does nvprof report for these kernels? Repository: rC Clang https://reviews.llvm.org/D52434 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits