On Fri, 03 Jun 2011 11:17:17 +1200, Gregory Ewing wrote: > Steven D'Aprano wrote: > >> def kronecker(x, y): >> if x == y: return 1 >> return 0 >> >> This will correctly consume NAN arguments. If either x or y is a NAN, >> it will return 0. > > I'm far from convinced that this result is "correct". For one thing, the > Kronecker delta is defined on integers, not reals, so expecting it to > deal with NaNs at all is nonsensical.
Fair point. Call it an extension of the Kronecker Delta to the reals then. > For another, this function as > written is numerically suspect, since it relies on comparing floats for > exact equality. Well, it is a throw away function demonstrating a principle, not battle- hardened production code. But it's hard to say exactly what alternative there is, if you're going to accept floats. Should you compare them using an absolute error? If so, you're going to run into trouble if your floats get large. It is very amusing when people feel all virtuous for avoiding equality and then inadvertently do something like this: y = 2.1e12 if abs(x - y) <= 1e-9: # x is equal to y, within exact tolerance ... Apart from being slower and harder to read, how is this different from the simpler, more readable x == y? What about a relative error? Then you'll get into trouble when the floats are very small. And how much error should you accept? What's good for your application may not be good for mine. Even if you define your equality function to accept some limited error measured in Units in Last Place (ULP), "equal to within 2 ULP" (or any other fixed tolerance) is no better, or safer, than exact equality, and very likely worse. In practice, either the function needs some sort of "how to decide equality" parameter, so the caller can decide what counts as equal in their application, or you use exact floating point equality and leave it up to the caller to make sure the arguments are correctly rounded so that values which should compare equal do compare equal. > But the most serious problem is, given that > >> NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN >> because it is an invalid operation, > > if kronecker(NaN, x) or kronecker(x, Nan) returns anything other than > NaN or some other sentinel value, then you've *lost* the information > that an invalid operation occurred somewhere earlier in the computation. If that's the most serious problem, then I'm laughing, because of course I haven't lost anything. x = result_of_some_computation(a, b, c) # may return NAN y = kronecker(x, 42) How have I lost anything? I still have the result of the computation in x. If I throw that value away, it is because I no longer need it. If I do need it, it is right there, where it always was. You seem to have fallen for the myth that NANs, once they appear, may never disappear. This is a common, but erroneous, misapprehension, e.g.: "NaN is like a trap door that once you have fallen in you cannot come back out. Otherwise, the possibility exists that a calculation will have gone off course undetectably." http://www.rhinocerus.net/forum/lang-fortran/94839-fortran-ieee-754- maxval-inf-nan-2.html#post530923 Certainly if you, the function writer, has any reasonable doubt about the validity of a NAN input, you should return a NAN. But that doesn't mean that NANs are "trap doors". It is fine for them to disappear *if they don't matter* to the final result of the calculation. I quote: "The key result of these rules is that once you get a NaN during a computation, the NaN has a STRONG TENDENCY [emphasis added] to propagate itself throughout the rest of the computation..." http://www.savrola.com/resources/NaN.html Another couple of good examples: - from William Kahan, and the C99 standard: hypot(INF, x) is always INF regardless of the value of x, hence hypot(INF, NAN) returns INF. - since pow(x, 0) is always 1 regardless of the value of x, pow(NAN, 0) is also 1. In the case of the real-valued Kronecker delta, I argue that the NAN doesn't matter, and it is reasonable to allow it to disappear. Another standard example where NANs get thrown away is the max and min functions. The latest revision of IEEE-754 (2008) allows for max and min to ignore NANs. > You can't get a valid result from data produced by an invalid > computation. Garbage in, garbage out. Of course you can. Here's a trivial example: def f(x): return 1 It doesn't matter what value x takes, the result of f(x) should be 1. What advantage is there in having f(NAN) return NAN? >> not because NANs are magical goop that spoil everything they touch. > > But that's exactly how the *have* to behave if they truly indicate an > invalid operation. > > SQL has been mentioned in relation to all this. It's worth noting that > the result of comparing something to NULL in SQL is *not* true or false > -- it's NULL! I'm sure they have their reasons for that. Whether they are good reasons or not, I don't know. I do know that the 1999 SQL standard defined *four* results for boolean comparisons, true/false/unknown/null, but allowed implementations to treat unknown and null as the same. -- Steven -- http://mail.python.org/mailman/listinfo/python-list