softworkz . (HE12025-04-22): > At the implementation level, I sensed that "you" ("FFmpeg") > are following some principles which are somewhat contradictive to > those that I'm usually adhering to (e.g. "parameter validation > being a responsibility of the call site, crashing otherwise > being acceptable"). Nonetheless, I'm the one who has to adapt, > and I'm not going to question that.
I am feeling didactic, so I will write what I have to say on the topic in full details, so that it can also serve as future reference. Crashing immediately when some unexpected condition happens is not a quirk of a few FFmpeg developers, it is very often the only way to do things and also often better than any available alternative. Knowing what to do when it is only an option is not always obvious, but this is FFmpeg, not some management software project developed by engineers fresh from an average school. Consider the simplest mathematical operation that can fail, a division: a = b / c; What happens if c is 0? It depends: on the language, the CPU, the build options, possibly the phases of the Moon. But mostly, what happens lives between two extremes: - it crashes immediately, sometimes even with a hardware exception generated by the CPU; - it returns some kind of cheesy not-a-number value that will contaminate all computations done with a, leading to silently corrupted output. Intermediate cases can be: return a not-a-number but something later tests it and crashes; or: return not a number and be later conditionally discarded to produce corrupt output only once in a while. For the sake of simplicity, let us focus on the extreme cases. There are two things to say here: - both are very bad; - silently corrupting the output is the worst. (There are languages where division by zero has a precise and mathematically consistent semantic that a very careful programmer could make use of. That does not invalidate my point.) Both are bad: that means that whenever we write a division, we must be sure that the divisor is not zero. Sometimes, it is very easy: unsigned n = 0; do { n++; } while (*(p++)); a = b / n; Loops of the do…while kind are always executed at least once, therefore we know that n ≠ at the end. (Did you notice the flaw in that reasoning? I will come back to it later.) Sometimes, it is just a matter of testing: if (nb_notes == 0) die("Why are you asking me to compute an average when you have not yet entered any note, you idiot?"); And sometimes it relies on a 10-pages mathematical proof that the algorithm we are implementing is correct. But hard or easy, it is always the responsibility of the programmer to know that the divisor is not zero before coding a division. This is not unique to divisions: We must know a pointer points to a valid object before dereferencing it. We must know a pointer points to writable memory before writing to it. We must know an index is within the boundaries of an array before accessing it. And this is not limited to C and other programs FFmpeg is fond of: write into a cell beyond the end of the array in C, it will let you modify something else entirely and that might let the spooks or cryptlockers in; but do it in Java or OCaml, it will trigger an exception that we probably neglected to catch, because if we thought of catching it we would probably have written it more elegantly. The conclusion of this section is this: Any programmer worth their salt must know a lot of invariants about each line of their code, invariants that are not written into the code but that they would be able to express and prove if asked to. “At this point, this number is not zero”; “at this point, this pointer comes from malloc() or realloc()”, “at this point there are no other references to this object”, “at this point this block of memory has a 0 before the end” and so on. Failure to account for the invariants required by the constructs used in the program will result in the program misbehaving in more or less harmful ways. Now about function calls and API. As I said, constructs in the program have invariant requirements: if we divide, it must not be by zero. These requirements propagate backwards along the flow of the program: if l must not be zero, then the string it is the length of must not be empty, therefore the list from which it is built must not itself be empty, and so on. A program behaves correctly if at all places in the code flow, the invariants guaranteed by the earlier parts are at least as strict as the invariant requirements impose by the later parts. In a complete program, functions are just roudabouts in the program flow. As such, like any place in the program there are invariant requirements. But if we think of functions as separating different parts of a program that are under the responsibility of different person (or the same person at different times), theses invariants requirements must be documented. Sometimes, it is very subtle: “The strlen function computes the length of the string pointed to by s”: where does it say s must not be a NULL pointer? Not explicitly, but it is implied by “string pointed to”, because NULL does not point to anything, let alone a string. What happens if we call strlen() on a NULL pointer? That depends. On Unix systems, it usually results in a SIGSEGV; something similar will happen with other memory-protected OSes. But on a microcontroller, it will just compute how many non-0 octets there are at this place in the memory. Whatever the effect, it misbehaves. What about mistakes? Even excellent programmers, programmers worth rose Himalayan salt, can make mistakes. For example, in the code I gave above as an example where it is easy to be sure that n is not zero, what happens on x86_64 if p points to four giga-octets of non-zero values and then a 0? Oops. In the ideal world, that would be impossible. In the ideal world, the compiler would refuse to compile a program unless it can prove the required invariants are met, with the help of annotations helpfully provided by the programmer. This is what Rust tried to do and lamentably failed at. (That would leave out programs used to explore unprovable mathematical properties. I am perfectly fine letting them use some kind of “ignore-invariant” pragma, as long as it is not used in a program that might crash a space probe, explode a nuclear plant or, worse, make a game unwinnable.) In the real world, we are not there yet, by far. Yet we must decide what to do about all these invariants. In most of the places, we do nothing, because the road is straight and if we are speeding, the speed bump a hundred meters from now will trash us. But it is still useful to add our own speed bumps and traffic lights at strategic places. The worst thing we could do is pump the brakes before the speed bump. It is called “defensive programming”: “at this point, this value should not be negative, let us check if it is and make it 0 if necessary, the show must go on!” Well, it is not a show, it is a program producing a result, and if it goes on after breaking one of its invariant requirements it will almost certainly produce invalid result. That is evil. But do we add traffic lights or a speed bump? My traffic analogy breaks down, I will stop using it. Do we add do nothing, do we an assert or do we return an error? But first, a short point about asserts. Logically, an assert and doing nothing are the same, because asserts are often removed at build time (by the way, I hope all FFmpeg developers remember to always work with the maximum assert level). And because “crashing right now” is one case of “anything can happen”. But asserts, when they are enabled, make mistakes easier to debug, because they make their consequences more visible earlier. So: do nothing / assert, or return an error? This is the point where there is no longer an easy answer. It requires weighting the pros and the cons. The only thing I can do is make a list of a few considerations that affects the weighting of the pros and cons. Mostly, I will talk about code wrapped inside a function, with the caller possibly under the responsibility of a different programmer. One of the most important ones is: Is there already a channel to return an error? If the caller is already expected to check for an error return due to conditions outside of the control of any programmer (the hard drive died!), then using it to signal a programming error is an easy option. On the other hand, if the function has been of type void for ten years, changing it to return an error now is almost certainly not a good idea, because a lot of code already calls it without checking for an error. Even worse if the function used to return a positive value that was not supposed to be an error, and now we want to return an error with a distinct set of value (negative AVERROR code): we are in the “cheesy not-a-number value” case and risk contaminating the output. Another important one is: How easy is it to ensure the requirement is met? “Pointer must not be null” is one of the easiest to ensure, and as a consequence it is one that is very often completely neglected, relying on the OS sending a SIGSEGV. “Flags must be a binary OR of the following constants” is another very easy one: just do not do any fancy arithmetic with the value that will end up in the flags. If we only use the designated constants and binary arithmetic, we are guaranteed to be good. If not, what are we doing exactly? On the other hand, “expr must point to a string with a syntactically valid expression”: I cannot check the syntax of expr without duplicating a large part of the code I am about to call, and sometimes I might want to use an expression that comes from the outside. Useful rule of thumb: What would the C language do? When writing a new function, we can try to imagine how it would work if was a standard C function, and for that look at similar cases in the libc. For example, many many C functions say “undefined behavior” if NULL is passed where NULL is not expected. Whenever the answer to “what would the C language do” contains the words “undefined behavior”, then the best choice is to do nothing, possibly with an assert. System calls, on the other hand, are not a very good rule of thumb: they are heavily skewed towards “everything returns an error code, it is a grave mistake to ignore it”; they are heavily burdened by the need for compatibility with existing code and other systems; and the consequences of doing nothing and just let the dice fall would be catastrophic. To summarize: Deciding what to do when invalid values are met requires properly understanding the invariant requirements of the program and doing a cost-analysis of the options, including whether the rest of the code can easily enforce those invariants and whether it can do something graceful with an error. Advice addressed are more beginner programmers tend to be more one-size-fit-all, and as such inefficient. And often more dogmatic. Regards, -- Nicolas George _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".