================ @@ -1614,6 +1650,50 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createParallel( IfCondition, NumThreads, PrivTID, PrivTIDAddr, ThreadID, ToBeDeletedVec); }; + + std::optional<omp::OMPTgtExecModeFlags> ExecMode = + getTargetKernelExecMode(*OuterFn); + + // If OuterFn is not a Generic kernel, skip custom allocation. This causes + // the CodeExtractor to follow its default behavior. Otherwise, we need to + // use device shared memory to allocate argument structures. + if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC) { + OI.CustomArgAllocatorCB = [this, + EntryBB](BasicBlock *, BasicBlock::iterator, + Type *ArgTy, const Twine &Name) { + // Instead of using the insertion point provided by the CodeExtractor, + // here we need to use the block that eventually calls the outlined + // function for the `parallel` construct. + // + // The reason is that the explicit deallocation call will be inserted + // within the outlined function, whereas the alloca insertion point + // might actually be located somewhere else in the caller. This becomes + // a problem when e.g. `parallel` is inside of a `distribute` construct, + // because the deallocation would be executed multiple times and the + // allocation just once (outside of the loop). + // + // TODO: Ideally, we'd want to do the allocation and deallocation + // outside of the `parallel` outlined function, hence using here the + // insertion point provided by the CodeExtractor. We can't do this at + // the moment because there is currently no way of passing an eligible + // insertion point for the explicit deallocation to the CodeExtractor, + // as that block is created (at least when nested inside of + // `distribute`) sometime after createParallel() completed, so it can't + // be stored in the OutlineInfo structure here. ---------------- Meinersbur wrote:
This was meant as on open question since I do not fully uderstand the problem. The idea with the temporary block was to create an unconnected BB and then later connect/move it to the expected location e.g. in the finalize() method or by the caller of `createParallel`, though I do not know how they would know where to insert it. That temporary BB could also be created by the caller, have pass it to createParallel (e.g. as `deallocIP`), then make it the caller's responsibility to connect it. It sounds like you have about the same in mind. OK to defer it to some later point. https://github.com/llvm/llvm-project/pull/150925 _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits