yaxunl marked 3 inline comments as done. yaxunl added inline comments.
================ Comment at: clang/lib/Sema/Sema.cpp:1508 void checkFunc(SourceLocation Loc, FunctionDecl *FD) { + auto DiagsCountIt = DiagsCount.find(FD); FunctionDecl *Caller = UseStack.empty() ? nullptr : UseStack.back(); ---------------- rjmccall wrote: > yaxunl wrote: > > rjmccall wrote: > > > yaxunl wrote: > > > > rjmccall wrote: > > > > > yaxunl wrote: > > > > > > rjmccall wrote: > > > > > > > yaxunl wrote: > > > > > > > > rjmccall wrote: > > > > > > > > > It makes me a little uncomfortable to be holding an iterator > > > > > > > > > this long while calling a fair amount of other stuff in the > > > > > > > > > meantime. > > > > > > > > > > > > > > > > > > Your use of DiagsCount is subtle enough that it really needs > > > > > > > > > to be explained in some comments. You're doing stuff > > > > > > > > > conditionally based on both whether the entry exists but also > > > > > > > > > whether it's non-zero. > > > > > > > > added comments > > > > > > > Okay, thanks for that. Two points, then. First, it looks like > > > > > > > the count is really just a boolean for whether the function > > > > > > > recursively triggers any diagnostics. And second, can't it just > > > > > > > be as simple as whether we've visited that function at all in a > > > > > > > context that's forcing diagnostics to be emitted? The logic > > > > > > > seems to be to try to emit the diagnostics for each use-path, but > > > > > > > why would we want that? > > > > > > For the second comment, we need to visit a function again for each > > > > > > use-path because we need to report each use-path that triggers a > > > > > > diagnostic, otherwise users will see a new error after they fix one > > > > > > error, instead of seeing all the errors at once. > > > > > > > > > > > > For the first comment, I will change the count to two flags: one > > > > > > for the case where the function is not in device context, the other > > > > > > is for the case where the function is in device context. This will > > > > > > allow us to avoid redundant visits whether or not we are in a > > > > > > device context. > > > > > > For the second comment, we need to visit a function again for each > > > > > > use-path because we need to report each use-path that triggers a > > > > > > diagnostic, otherwise users will see a new error after they fix one > > > > > > error, instead of seeing all the errors at once. > > > > > > > > > > This is not what we do in analogous cases where errors are triggered > > > > > by a use, like in template instantiation. The bug might be that the > > > > > device program is using a function that it shouldn't be using, or the > > > > > bug might be that a function that's supposed to be usable from the > > > > > device is written incorrectly. In the former case, yes, not > > > > > reporting the errors for each use-path may force the programmer to > > > > > build multiple times to find all the problematic uses. However, in > > > > > the latter case you can easily end up emitting a massive number of > > > > > errors that completely drowns out everything else. It's also > > > > > non-linear: the number of different use-paths of a particular > > > > > function can be combinatoric. > > > > The deferred diagnostics fall into the first case mostly. > > > > > > > > Not all diagnostic messages happen in device host functions are > > > > deferred. Most of diagnostic messages are emitted immediately, > > > > disregarding whether the function is emitted or not. > > > > > > > > Only a few special types of diagnostic messages e.g. inline assembly > > > > errors, exceptions, varags are deferred. This is because clang has to > > > > use pragmas to make all functions device and host in some system > > > > headers to be able to use them in device compilation, whereas some of > > > > these functions will cause error only if they are emitted in device > > > > compilation. Since we cannot change these headers, we have to defer > > > > such diagnostics to the point where they are actually triggered. > > > > Therefore we are mostly interested in "who is calling these functions" > > > > instead of the diagnostics themselves. > > > > > > > > For normal programs containing no diagnostics, the current approach > > > > should sufficiently avoid all redundant visits and all functions will > > > > be visited once. > > > > > > > > The functions may be visited multiple times when there are deferred > > > > diagnostics, however this should be limited by the maximum number of > > > > errors. > > > Okay. I understand the point now about being mostly concerned about > > > using functions in headers. > > > > > > My concern about visiting things a combinatorial number of times is less > > > about generating an excessive number of errors and more about taking an > > > excessive amount of time to finish compilation. The compiler really > > > should not be using any exponential algorithms; a simple visited set will > > > ensure that, but I think what you're doing does not. Can you make a case > > > for why that's not true? > > The current approach is linear for the number of visits to functions. > > > > Let's assume the number of functions is N. > > > > The current approach already restricted revisiting a node that's already in > > the current use-path to prevent infinite recursion, therefore a use-path > > has maximum length N. > > > > The functions can be classified as two categories: > > > > Those who trigger deferred diags directly or indirectly, which we call > > trigger nodes. > > > > Those who do not trigger deferred diags directly or indirectly, which we > > call non-trigger nodes. > > > > Non-trigger nodes can be identified at most with two visits. All other > > visits are skipped. Therefore the total number of visits of non-trigger > > nodes is less than 2*N. > > > > We can also classify the use-paths as trigger paths and non-trigger paths. > > Each trigger path causes at least one diagnostic emitted. Each non-trigger > > path does not cause any diagnostics emitted. > > > > Due to limit of maximum diagnostics allowed, the algorithm will stop after > > a finite number of use-paths visited. Let's say it is M. (In reality, M > > could be much smaller than the maximum diagnostic allowed.) Then the number > > of trigger paths is at most M. > > > > Let's list all the use-paths visited and count all the nodes visited in > > these use-paths. > > > > First we consider all the use-paths that containing only non-trigger nodes, > > since each non-trigger nodes can only be visited twice. The total number of > > visits of nodes is less than 2*N. > > > > Then we consider all the trigger paths. Since the maximum path length is N, > > the total number of nodes visited is less than N*M. > > > > Now we only need to consider the non-trigger paths that starts with trigger > > nodes and end with a non-trigger node. For example, ABCD, where A, B, C, D > > denote nodes (functions) visited in the path. We can prove that A, B, C > > must be trigger nodes, since if any of A, B, C is a non-trigger node, the > > path will ends earlier, not reaching D. We can also prove that the sub-path > > ABC must be shared with a trigger path e.g. ABCEF, since otherwise C will > > be a non-trigger node. Then we do not need to count visit of A, B, and C, > > since they have already been counted when we count the visited nodes by > > trigger paths. > > > > Therefore, the total number of visits for functions is less than (2+M)*N, > > where M is actual number of diagnostics emitted. > > > > Due to limit of maximum diagnostics allowed, the algorithm will stop after > > a finite number of use-paths visited. > > Why do you think this is true? I don't see anything in your actual code > which asks whether the error limit has been reached; it just does all the > work and trusts the diagnostics engine to stop emitting diagnostics > eventually. > > Try turning the error limit off (which will tell you how much work you're > doing, since you're not short-circuiting when it's reached) and seeing how > many diagnostics you get if you have a chain of functions like this: > > ``` > // All of these functions should be emitted on demand. > void hasInvalid() { /* this function should trigger a deferred diagnostic of > some sort */ } > void use0() { hasInvalid(); hasInvalid(); } > void use1() { use0(); use0(); } > void use2() { use1(); use1(); } > void use3() { use2(); use2(); } > void use4() { use3(); use3(); } > // The final function should be a kernel or something like that. > ``` > > Also, in general, since the error limit can be disabled, I don't think it's a > good idea for it to be the only thing stopping you from doing an exponential > amount of work. It seems not practical to emit diagnostic for each triggering use-path since this will cause exponential time and also exponential diagnostics. I will make change so that each function is visited at most twice. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77028/new/ https://reviews.llvm.org/D77028 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits