aaron.ballman added a comment.

In D135551#3847962 <https://reviews.llvm.org/D135551#3847962>, @dblaikie wrote:

> In D135551#3847607 <https://reviews.llvm.org/D135551#3847607>, @aaron.ballman 
> wrote:
>
>> In D135551#3847444 <https://reviews.llvm.org/D135551#3847444>, @dblaikie 
>> wrote:
>>
>>> I thought this was settled quite a while ago and enshrined in the style 
>>> guide: https://llvm.org/docs/CodingStandards.html#assert-liberally
>>>
>>> `assert(0)` should not be used if something is reachable. We shouldn't have 
>>> a "this violates an invariant, but if you don't have asserts enabled you do 
>>> get some maybe-acceptable behavior".
>>>
>>> I feel fairly strongly that any cases of "reachable asserts" should be 
>>> changed to valid error handling or `llvm_report_error` and remaining 
>>> `assert(0)` should be transformed to `llvm_unreachable`. (also, ideally, 
>>> don't use branch-to-`unreachable` where possible, instead assert the 
>>> condition - in cases where the `if` has side effects, sometimes that's the 
>>> nicest way to write it, but might be clearer, if more verbose to use a 
>>> local variable for the condition, then assert that the variable is true 
>>> (and have the requisite "variable might be unused" cast))
>>
>> I would be okay with that, but that's not what this patch was doing -- it 
>> was changing `assert(0)` into an `llvm_unreachable` more mechanically, and I 
>> don't think that's a valid transformation. The key, to me, is not losing the 
>> distinction between "reaching here is a programming mistake that you'd make 
>> during active development" vs "we never expect to reach this patch and want 
>> to optimize accordingly."
>
> I don't really think those are different things, though. Violating invariants 
> is UB and there's no discussion to be had about how the program (in a 
> non-asserts build) behaves when those invariants are violated - all bets are 
> off, whether it's assert or unreachable.

I think they're the same thing in terms of runtime behavior, but I feel (rather 
strongly) that they're different in terms of documentation when reading the 
source. This code pattern exists and keeps coming up year after year, which is 
sufficient to inform me that the community thinks there is *some* distinction 
to be made there. Also, the fact that we have report_fatal_error *and* 
unreachable APIs signals that we already understand there's a distinction 
between "reaching here will never happen; optimize accordingly" and "reaching 
here is a surprising mistake".

>> `__builtin_unreachable` changes the debugging landscape far too much for me 
>> to want to see folks using it for "reaching here is a programming mistake" 
>> situations, *especially* in RelWithDebInfo builds where optimizations are 
>> enabled and may result in surprising call stacks and time travel debugging.
>
> Generally LLVM's pretty hard to fathom in a non-asserts build anyway, right? 
> (that's the first thing any of us do is reproduce with an assertions build 
> that may fail miles away from where a crash occurred because an invariant was 
> violated much earlier) - that `cast` won't crash/will continue on happily in 
> a non-asserts build seems like a much larger hole to debuggability of a 
> non-asserts build than any unreachable?

This might be true -- personally, I tend to only use debug builds with MSVC 
because RelWithDebInfo isn't sufficient for my daily needs. However, I've 
definitely heard of folks who use RelWithDebInfo for their daily work 
(RelWithDebInfo + Asserts specifically, IIRC) because of the improved build 
times and runtime performance; we should be sure we're not disrupting that 
workflow too much.

>>>> Historically, we've usually used llvm_unreachable for situations where 
>>>> we're saying "this code cannot be reached; if it can, something else has 
>>>> gone seriously wrong." For example, in code like: int foo(SomeEnum E) { 
>>>> switch (E) { case One: return 1; default: return 2; } 
>>>> llvm_unreachable("getting here would be mysterious"); } and we've used 
>>>> assert(0) for situations where we're saying "this code is possible to 
>>>> reach only when there were mistakes elsewhere which broke invariants we 
>>>> were relying on."
>>>
>>> I don't think those are different things though - violating invariants is 
>>> ~= something going seriously wrong.
>>>
>>> (sorry, I guess I should debate this on the thread instead of here - but I 
>>> think most of the folks on that thread did agree with the LLVM style 
>>> guide/the direction here)
>>>
>>> This, I think was also discussed about a decade ago in the LLVM community 
>>> and resulted in r166821/2962d9599e463265edae599285bbc6351f1cc0ef which 
>>> specifically "Suggests llvm_unreachable over assert(0)" and is the policy 
>>> of LLVM - this change is consistent with that policy.
>>
>> I don't have the context for those changes in my email either, but 
>> regardless of what we thought ten years ago, we have code in the code base 
>> today that assumes a difference in severity between kinds of unreachable 
>> statements so we need to be careful when correcting mistakes. I think we're 
>> still in agreement that `llvm_unreachable` should be preferred over 
>> `assert(0)` in situations where the code is expected to be impossible to 
>> reach. I think we're also still in agreement that "correct error reporting" 
>> is preferred over `assert(0)`. Where we might still have daylight are the 
>> occasional situations where `assert(0)` gives a better experience -- when 
>> the code is possible to reach but reaching it signifies a developer (not 
>> user) mistake that is plausible to make when doing new development, such as 
>> when misusing the C interface (where there isn't always an error that should 
>> be reported via the API). I think these uses should generally be rare, but I 
>> don't think it's a "never do this" kind of situation.
>
> That still seems like something that should be caught by an asserts build and 
> is UB otherwise. If you're developing against LLVM's APIs in a non-assertions 
> build, there's a lot of other invariants that won't be checked (`cast` being 
> a pretty core example) and will make debugging really difficult.
>
>> `llvm_unreachable` asserts in debug mode, so it has the nice property we 
>> need when doing new development. But it doesn't convey the distinction in 
>> semantics. Maybe another approach is: `#define developer_bugcheck(msg) 
>> llvm_unreachable(msg)` (or something along those lines). We still get the 
>> assertion in debug mode, we then get the better optimization in release 
>> mode, but we don't lose the semantic information in the code base. It 
>> doesn't really help the RelWithDebInfo case though (where call stacks may be 
>> unrecognizable as a result of time travel optimizations) but maybe that's a 
>> tradeoff worth making?
>
> I don't really see the distinction, though - it's a violated invariant, which 
> is a developer bug, whether it's from an assert on unreachable. They're both 
> expressing invariants - not of different strength. "Valid code should never 
> reach this branch" and "valid code should never produce a false result here".

There are different kinds of developer bugs and it's reasonable for people to 
want to express them differently, IMO. Concretely with a contrived example:

  enum E { Zero, One, Two };
  
  int func(E Val) {
    switch (Val) {
    case Zero: return 12;
    case One: return 200;
    case Two: return -1;
    }
    llvm_unreachable("never get here");
  }

This is a case where the code is technically reachable (someone can call 
`func((E)3)`) but the programmer's intent it "nobody will ever do that" because 
it's an internal API, not exposed to C, other safeguards like diagnostics 
ensure it, or whatever. Contrast this with:

  enum E { Zero, One, Two };
  
  extern "C" int func(unsigned Val) {
    switch (Val) {
    case Zero: return 12;
    case One: return 200;
    case Two: return -1;
    }
    assert(0 && "never get here");
  }

Same general situation, but now the safeguards are no longer in place. The 
function is externally callable by anyone who wants to use it, the interface 
has lost the connection to the enumeration, etc. So the code is also 
technically reachable, but the author wants to signal the difference in 
plausibility of reaching the erroneous state.

I definitely agree that the end results of reaching the unreachable bits are 
the same in terms of what happens at runtime. But I'm worried about maintaining 
the code -- losing the distinction between the two situations makes the person 
debugging the failure have to figure out whether someone missed adding a 
safeguard elsewhere vs something more difficult to solve such as a miscompile 
from the host implementation, stack stomping, etc.

I don't care whether we spell it `assert(0)` or `report_fatal_error()` or 
something else. I care that we don't have a policy requiring folks to not make 
a distinction between the two scenarios aside from leaving comments in the code 
as I think that's a step backwards.

> I'd really like to avoid what I see as cleanup - replacing one construct 
> (assert(false)) with another (llvm_unreachable) of identical (to me) 
> contractual strength - being slowed because of this distinction. Replacing 
> reachable-unreachables, or reachable-false-assertions with llvm_report_error, 
> to me, is orthogonal to this cleanup.

Understood. I'm pushing back on this being a cleanup in all cases.

> (& for the C API - I think assertions/unreachable are the right thing, not 
> llvm_report_error - I'm sure there's many more ways to violate the C API 
> contract that would result in inexplicable/confusing behavior and that will 
> only ever be protected by assertions, than we might cover by a few 
> llvm_report_errors on the interface & that should be the direction we 
> encourage people to go down - develop with assertions enabled)
>
> Basically llvm_report_error ends up/should be a "we couldn't find a better 
> way to do error handling here, but it is really a reachable error we have to 
> do something with" - each one is a bug in the library-ness of LLVM. (or it's 
> used up in a tool implementation (above the library layer) and then it's 
> fine/just a convenient way to error/exit, though probably still isn't so good 
> for diagnostic quality from such tools). If it's not reachable by a correct 
> usage (be it from a command line tool or API) then it should be an assertion, 
> not an llvm_report_error.
>
> But I know this issue comes up from time to time and I've yet to figure out a 
> way to come to shared understandings on it :/ so I don't expect my rather 
> narrow and firm definition to necessarily carry the day on this.

Concrete guidance around reachability that I would be happy with is:

- Prefer a user-facing diagnostic in any circumstance that's under the user's 
direct control (through command line options, source code, etc); users should 
not get failed assertions/crashes as a form of error reporting.
- Prefer `assert(condition)` in any circumstance under which there is a 
condition to be checked to validate the state of the system. The only use of a 
literal in such an assertion should be a string literal to use as a message. 
e.g., no `assert(0);`
- Prefer `llvm_report_error()` in any circumstance under which a code path is 
functionally possible to reach, but only in erroneous executions that signify a 
mistake on the part of the LLVM developer elsewhere in the program.
- Prefer `llvm_unreachable()` in any circumstance under which a code path is 
believed to be functionally impossible to reach (even if technically possible 
to reach). The API is now self-documenting to mean "this code really should be 
totally unreachable".

Something is functionally possible to reach when there are missing safeguards 
elsewhere that should have prevented calling the function in that state. 
Something is functionally impossible to reach when miscompiles, undefined 
behavior, or malicious users modifying memory with a debugger (etc) are the 
realistic ways to reach that state. When in doubt as to whether something is 
functionally possible to reach or not, use your best judgement but prefer 
`llvm_report_error` on the assumption that most reachability mistakes are LLVM 
developer errors rather than "extenuating circumstance" kind of situations.

That said, I have no idea if I'm being too pedantic here. I'm basing this off 
my own experiences with the code base as well as internal discussions with 
other folks at Intel working on the project and what their expectations are 
(both as new-to-the-code-base folks and long-time contributors), but this is a 
pretty small selection of people.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135551/new/

https://reviews.llvm.org/D135551

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to