On Wed, Jul 1, 2020 at 9:46 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Hello,
>
> Recent changes to PyArrow seem to have taken the stance that comparing
> null values should return null.
This is actually how the previous versions work:
 https://github.com/apache/arrow/blob/master/python/pyarrow/scalar.pxi#L51

So my main motivation was to keep backward compatibility.
I totally agree with your points and I'd also prefer returning boolean values
from __eq__.

> The problem is that it breaks the
> expectation that comparisons should return booleans, and perculates into
> crazy behaviour in other places.  Here is an example of such
> misbehaviour in the scalar refactor PR:
>
> >>> import pyarrow as pa
>
>
> >>> na = pa.scalar(None)
>
>
> >>> na == na
>
>
> <pyarrow.NullScalar: None>
> >>> na == 5
>
>
> <pyarrow.NullScalar: None>
> >>> bool(na == 5)
>
>
> True
> >>> if na == 5: print("yo!")
>
>
> yo!
> >>> na in [5]
>
>
> True
>
> But you can see it also with arrays containing null values:
>
> >>> pa.array([1, None]) in [pa.scalar(42)]
>
>
> True
>
> I think that Python equality operators should behave in a
> Python-sensible way (return True or False).  Have people call another
> method if they like the fancy (or noxious, depending on the POV)
> semantics of returning null when comparing null with anything.
>
> (note that Numpy doesn't have null scalars, so it can be less
> conservative in its customization of equality methods)
>
> Regards
>
> Antoine.

Reply via email to