estewart08 added a comment. In D99432#2726845 <https://reviews.llvm.org/D99432#2726845>, @ABataev wrote:
> In D99432#2726588 <https://reviews.llvm.org/D99432#2726588>, @estewart08 > wrote: > >> In D99432#2726391 <https://reviews.llvm.org/D99432#2726391>, @ABataev wrote: >> >>> In D99432#2726337 <https://reviews.llvm.org/D99432#2726337>, @estewart08 >>> wrote: >>> >>>> In D99432#2726060 <https://reviews.llvm.org/D99432#2726060>, @ABataev >>>> wrote: >>>> >>>>> In D99432#2726050 <https://reviews.llvm.org/D99432#2726050>, @estewart08 >>>>> wrote: >>>>> >>>>>> In D99432#2726025 <https://reviews.llvm.org/D99432#2726025>, @ABataev >>>>>> wrote: >>>>>> >>>>>>> In D99432#2726019 <https://reviews.llvm.org/D99432#2726019>, >>>>>>> @estewart08 wrote: >>>>>>> >>>>>>>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not >>>>>>>> see how this helps SPMD mode with team privatization of declarations >>>>>>>> in-between target teams and parallel regions. >>>>>>> >>>>>>> Diв you try the reproducer with the applied patch? >>>>>> >>>>>> Yes, I still saw the test fail, although it was not with latest >>>>>> llvm-project. Are you saying the reproducer passes for you? >>>>> >>>>> I don't have CUDA installed but from what I see in the LLVM IR it shall >>>>> pass. Do you have a debug log, does it crashes or produces incorrect >>>>> results? >>>> >>>> This is on an AMDGPU but I assume the behavior would be similar for NVPTX. >>>> >>>> It produces incorrect/incomplete results in the dist[0] index after a >>>> manual reduction and in turn the final global gpu_results array is >>>> incorrect. >>>> When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] >>>> having been updated by thread 1. Which tells me the array is still thread >>>> private. >>>> Adding some printfs, looking at one teams' output: >>>> >>>> SPMD >>>> >>>> Thread 0: dist[0]: 1 >>>> Thread 0: dist[1]: 0 // This should be 1 >>>> After reduction into dist[0]: 1 // This should be 2 >>>> gpu_results = [1,1] // [2,2] expected >>>> >>>> Generic Mode: >>>> >>>> Thread 0: dist[0]: 1 >>>> Thread 0: dist[1]: 1 >>>> After reduction into dist[0]: 2 >>>> gpu_results = [2,2] >>> >>> Hmm, I would expect a crash if the array was allocated in the local memory. >>> Could you try to add some more printfs (with data and addresses of the >>> array) to check the results? Maybe there is a data race somewhere in the >>> code? >> >> As a reminder, each thread updates a unique index in the dist array and each >> team updates a unique index in gpu_results. >> >> SPMD - shows each thread has a unique address for dist array >> >> Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8 >> Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc >> >> Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0 >> Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4 >> >> Team 0 Thread 0: After reduction into dist[0]: 1 >> Team 0 Thread 0: gpu_results address: 0x7f92a5000000 >> -------------------------------------------------- >> Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188 >> Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c >> >> Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180 >> Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184 >> >> Team 1 Thread 0: After reduction into dist[0]: 1 >> Team 1 Thread 0: gpu_results address: 0x7f92a5000000 >> >> gpu_results[0]: 1 >> gpu_results[1]: 1 >> >> Generic - shows each team shares dist array address amongst threads >> >> Team 0 Thread 1: dist[0]: 1, 0x7fac01938880 >> Team 0 Thread 1: dist[1]: 1, 0x7fac01938884 >> >> Team 0 Thread 0: dist[0]: 1, 0x7fac01938880 >> Team 0 Thread 0: dist[1]: 1, 0x7fac01938884 >> >> Team 0 Thread 0: After reduction into dist[0]: 2 >> Team 0 Thread 0: gpu_results address: 0x7fabc5000000 >> -------------------------------------------------- >> Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10 >> Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14 >> >> Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10 >> Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14 >> >> Team 1 Thread 0: After reduction into dist[0]: 2 >> Team 1 Thread 0: gpu_results address: 0x7fabc5000000 > > Could you check if it works with `-fno-openmp-cuda-parallel-target-regions` > option? Unfortunately that crashes: llvm-project/llvm/lib/IR/Instructions.cpp:495: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99432/new/ https://reviews.llvm.org/D99432 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits