On Wed, Oct 21, 2015 at 03:49:08PM -0400, Nathan Sidwell wrote: > This patch is the device-specific half of the previous patch. It processes > the partition head & tail markers and loop abstraction functions inserted > during omp lowering. > > In the oacc_device_lower pass we scan the CFG reconstructing the set of > nested loops demarked by IFN_UNIQUE (HEAD_MARK) & IFN_UNIQUE (TAIL_MARK) > functions. The HEAD_MARK function provides the loop partition information > provided by the user. Once constructed we can iterate over that structure > checking partitioning consistency (for instance an inner loop must use a > dimension 'inside' an outer loop). We also assign specific partitioning axes > here. Partitioning updates the parameters of the IFN_LOOP and IFN_FORK/JOIN > functions appropriately. > > Once partitioning has been determined, we iterate over the CFG scanning for > the marker, fork/join and loop functions. The marker functions are deleted, > the fork & join functions are conditionally deleted (using the target hook > of patch 3), and the loop function is expanded into code calculating the > loop parameters depending on how the loop has been partitioned. This uses > the OACC_DIM_POS and OACC_DIM_SIZE builtins included in patch 7.
So, how do you expand the OACC loops on non-PTX devices (host, or say XeonPhi)? Do you drop the IFNs and replace stuff with normal loops? I don't see anything that would e.g. set the various flags that e.g. OpenMP #pragma omp simd or Cilk+ #pragma simd sets, like loop->safelen, loop->force_vectorize, maybe loop->simduid and promote some vars to simduid arrays if that is relevant to OpenACC. Jakub