Hello,

This patch series addresses a correctness issue in how OpenMP SIMD regions are
transformed for SIMT execution.  On NVPTX, OpenMP target code runs with
per-warp stacks outside of SIMD regions, and needs to transition to per-lane
stacks on SIMD region boundaries.  Originally the plan was to implement that
by outlining SIMD loop into a separate function, and switch stacks around the
function call.  I didn't like that approach due to how it would penalize even
the simplest SIMD loops, and how it's not convinient to implement in GCC.

These patches implement an alternative approach I didn't see until recently.
Instead of outlining, collect variables that would need to be on per-lane
stacks (that is, addressable private variables) to one struct, and allocate
that struct with an alloca-like function.

After OpenMP lowering, inlining might break this by inlining functions with
address-taken locals into SIMD regions.  For now, such inlining is disallowed
(this penalizes only SIMT code), but eventually that can be handled by
collecting those locals into an allocated struct in a similar manner.

Alexander

Reply via email to