On Wed, Jul 1, 2020 at 9:46 AM Antoine Pitrou <anto...@python.org> wrote: > > > Hello, > > Recent changes to PyArrow seem to have taken the stance that comparing > null values should return null. This is actually how the previous versions work: https://github.com/apache/arrow/blob/master/python/pyarrow/scalar.pxi#L51
So my main motivation was to keep backward compatibility. I totally agree with your points and I'd also prefer returning boolean values from __eq__. > The problem is that it breaks the > expectation that comparisons should return booleans, and perculates into > crazy behaviour in other places. Here is an example of such > misbehaviour in the scalar refactor PR: > > >>> import pyarrow as pa > > > >>> na = pa.scalar(None) > > > >>> na == na > > > <pyarrow.NullScalar: None> > >>> na == 5 > > > <pyarrow.NullScalar: None> > >>> bool(na == 5) > > > True > >>> if na == 5: print("yo!") > > > yo! > >>> na in [5] > > > True > > But you can see it also with arrays containing null values: > > >>> pa.array([1, None]) in [pa.scalar(42)] > > > True > > I think that Python equality operators should behave in a > Python-sensible way (return True or False). Have people call another > method if they like the fancy (or noxious, depending on the POV) > semantics of returning null when comparing null with anything. > > (note that Numpy doesn't have null scalars, so it can be less > conservative in its customization of equality methods) > > Regards > > Antoine.