[PATCH] D133956: git push Cuda Check for ignored errors after calling a CUDA kernel

Bartłomiej Cieślar via Phabricator via cfe-commits Thu, 15 Sep 2022 14:59:33 -0700

barcisz added a comment.

In D133956#3793022 <https://reviews.llvm.org/D133956#3793022>, @tra wrote:


> I think ultimately the checker should be generalized to flag all unchecked 
> CUDA runtime calls. The problem is that that is going to be exceedingly noisy 
> in practice as a lot of real code does not bother to check for the errors 
> consistently. Limiting the checks to kernel launches may be a reasonable 
> starting point as it would give us the ability to zero in on the culprit 
> kernel by running the app with "CUDA_LAUNCH_BLOCKING".

By that do you mean that the way the check is now it is acceptable or that it 
should be improved to handle intra-procedural analysis? The intention with this 
check is to work a lot in tandem with the one in D133804 
<https://reviews.llvm.org/D133804>, which therefore prevents most such cases. 
The practice that the check checks for is also commonly used in ML frameworks 
which heavily rely on CUDA, so not catching such cases might still be helpful 
for them just for the sake of preserving code consistency and catching such 
errors (since if there is an error then that means that some part of the code 
was broken anyways). Thus, the check is optimized for lowering false positives 
during static checking and for a practice lowering the number of false 
negatives within the CUDA code.



================
Comment at: 
clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu:53
+  }
+  auto err = cudaGetLastError();
+
----------------
tra wrote:
> Just curious -- is it sufficient to just call `cudaGetLastError();` ? Or does 
> the checker require using its result, too? I.e. in practice this particular 
> check will not really do anything useful. The tests below look somewhat 
> inconsistent. 
Technically it does not require the user to actually use the value of 
`cudaGetLastError()`, but


  # If they are calling it then they most likely did not place this call there 
randomly and are using it to check for the error returned by the kernel
  # the check being introduced in D133804 can be used to check if the return 
value has been used, so checking it here as well would have been a duplication




================
Comment at: 
clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu:75
+  // CHECK-MESSAGES: :[[@LINE-1]]:{{[0-9]+}}: warning: Possible unchecked 
error after a kernel launch.
+  // Calling an error-checking function after a kernel is not considered safe.
+  errorCheck();
----------------
tra wrote:
> WDYM by "is not considered safe" here? How is that different from calling 
> `cudaGetLastError()` and checking its value?
As in the check does not do inter-procedural analysis short of adding the 
handler to AcceptedHandlers, so the check will flag such occurences


================
Comment at: 
clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu:80-82
+  b<<<1, 2>>>();; /* The semicolons are here because the
+    detection of the macro is done with a lexer */ ;
+  CUDA_CHECK_KERNEL();
----------------
tra wrote:
> What would happen with a single `;` as would be seen in the normal user code?
Nothing, it would work just fine; it's rather that all other kernel calls in 
this test use a single `;` so I want to check this case here


================
Comment at: 
clang-tools-extra/test/clang-tidy/checkers/cuda/unsafe-kernel-call-macro-handler.cu:112
+  do {
+    auto err2 = cudaGetLastError();
+  } while(0);
----------------
tra wrote:
> Why does this case produce no warning, while a very similar case above does? 
> In both cases result of `cudaGetLastError()` is assigned to an unused 
> variable within the loop body.
> 
> ```
>   b<<<1, 2>>>();
>   // CHECK-MESSAGES: :[[@LINE-1]]:{{[0-9]+}}: warning: Possible unchecked 
> error after a kernel launch.
>   for(;;)
>     auto err2 = cudaGetLastError(); // Brackets omitted purposefully, since 
> they create an additional AST node
> 
> ```
Because often a macro will wrap its error handling code in a do {...} while(0) 
loop and that's why we check this case and simmilar ones with CFG analysis


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133956/new/

https://reviews.llvm.org/D133956

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D133956: git push Cuda Check for ignored errors after calling a CUDA kernel

Reply via email to