https://llvm.org/bugs/show_bug.cgi?id=31333
Andrew Adams <andrew.b.ad...@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Andrew Adams <andrew.b.ad...@gmail.com> --- Your comments made me wonder if it's our pass setup that's wrong: https://github.com/halide/Halide/blob/master/src/CodeGen_PTX_Dev.cpp#L267 I think that code was written in 2012. Leaving it as a size-64 alloca, but changing the pass setup to this: https://github.com/halide/Halide/blob/0e1662f2382e7134205abcdcd995a54f3441365a/src/CodeGen_PTX_Dev.cpp#L267 gives me the best timings I've seen. It's 15% faster than the 64 individial allocas. No usage of local memory, and it seems to have decided to make the loads non-cached. So, pebkac I guess. Sorry about that. SROA probably wasn't even running before the address-space casts appeared. That's still using the legacy pass manager though. Is there some canonical piece of code that shows the right way to set up the passes for PTX kernels? -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs