[issue4296] Python assumes identity implies equivalence; contradicts NaN

2008-11-10 Thread Michael B Curtis

New submission from Michael B Curtis <[EMAIL PROTECTED]>:

Found in Python 2.4; not sure what other versions may be affected.

I noticed a contradiction with regards to equivalence when experimenting
with NaN, which has the special property that it "is" itself, but it
doesn't "==" itself:

>>> a = float('nan')
>>> a is a
True
>>> a == a
False
>>> b = [float('nan')]
>>> b is b
True
>>> b == b
True


I am not at all familiar with Python internals, but the issue appears to
be in PyObject_RichCompareBool of python/trunk/Objects/object.c

This method "Guarantees that identity implies equality".  However, this
doesn't "Gaurantee" this fact, but instead "Assumes" it, because it is
not something that is always True.  NaN is identical to itself, but not
equivalent to itself.

At a minimum, the contradiction introduced by this assumption should be
documented.  However, it may be possible to do better, by fixing it. 
The assumption appears to be made that identity should imply
equivalence, for the common case.  Would it therefore be possible to,
instead of having objects such as lists perform this optimization and
make this assumption, instead have the base object types implement this
assumption.  That is, for regular objects, when we evaluate equivalence,
we return True if the objects are identical.  Then, the optimization can
be removed from objects such as list, so that when they check the
equivalence of each object, the optimization is performed there.  NaN
can then override the default behavior, so that it always returns False
in equivalence comparisons.

--
components: Interpreter Core
messages: 75716
nosy: mikecurtis
severity: normal
status: open
title: Python assumes identity implies equivalence; contradicts NaN
type: behavior
versions: Python 2.4

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue4296>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4296] Python assumes identity implies equivalence; contradicts NaN

2008-11-11 Thread Michael B Curtis

Michael B Curtis <[EMAIL PROTECTED]> added the comment:

All,

Thank you for your rigorous analysis of this bug.  To answer the
question of the impact of this bug: the real issue that caused problems
for our application was Python deciding to silently cast NaN falues to
0L, as discussed here:

http://mail.python.org/pipermail/python-dev/2008-January/075865.html

This would cause us to erroneously recognize 0s in our dataset when our
input was invalid, which caused various issues.  Per that thread, it
sounds like there is no intention to fix this for versions prior to 3.0,
so I decided to detect NaN values early on with the following:


def IsNan(x):
  return (x is x) and (x != x)


This is not the most rigorous check, but since our inputs are expected
to be restricted to N-dimensional lists of numeric and/or string values,
this was sufficient for our purposes.

However, I wanted to be clear as to what would happen if this were
handed a vector or matrix containing a NaN, so I did a quick check,
which led me to this bug.  My workaround is to manually avoid the
optimization, with the following code:


def IsNan(x):
  if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set):
for i in x:
  if IsNan(i):
return True
return False
  else:
return (x is x) and (x != x)


This isn't particularly pretty, but since our inputs are relatively
constrained, and since this isn't performance-critical code, it suffices
for our purposes.  For anyone working with large datasets, this would be
suboptimal.  (As an aside, if someone has a better solution for a
general-case NaN-checker, which I'm sure someone does, feel free to let
me know what it is).

Additionally, while I believe that it is most correct to say that a list
containing NaN is not equal to itself, I would hesitate to claim that it
is even what most applications would desire.  I could easily imagine
individuals who would only wish for the list to be considered NaN-like
if all of its values are NaN.  Of course, that wouldn't be solved by any
changes that might be made here.  Once one gets into that level of
detail, I think the programmer needs to implement the check manually to
guarantee any particular expected outcome.

Returning to the matter at hand: while I cringe to know that there is
this inconsistency in the language, as a realist I completely agree that
it would be unreasonable to remove the optimization to preserve this
very odd corner case.  For this reason, I proposed a minimal solution
here to be that this oddity merely be documented better.

Thanks again for your thoughts.

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue4296>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com