On Tue, Jan 17, 2017 at 9:07 PM, Alexander Monakov <amona...@ispras.ru> wrote: > Hello, > > This patch series addresses a correctness issue in how OpenMP SIMD regions are > transformed for SIMT execution. On NVPTX, OpenMP target code runs with > per-warp stacks outside of SIMD regions, and needs to transition to per-lane > stacks on SIMD region boundaries. Originally the plan was to implement that > by outlining SIMD loop into a separate function, and switch stacks around the > function call. I didn't like that approach due to how it would penalize even > the simplest SIMD loops, and how it's not convinient to implement in GCC. > > These patches implement an alternative approach I didn't see until recently. > Instead of outlining, collect variables that would need to be on per-lane > stacks (that is, addressable private variables) to one struct, and allocate > that struct with an alloca-like function. > > After OpenMP lowering, inlining might break this by inlining functions with > address-taken locals into SIMD regions. For now, such inlining is disallowed > (this penalizes only SIMT code), but eventually that can be handled by > collecting those locals into an allocated struct in a similar manner.
Can you do the allocation fully after inlining instead? Richard. > Alexander