I think that if __eq__ does not return True/False exclusively, __bool__
should raise an exception to avoid these unexpected truthies. Python users
are used to that due to Numpy.


Op wo 1 jul. 2020 om 15:40 schreef Joris Van den Bossche <
jorisvandenboss...@gmail.com>:

> On Wed, 1 Jul 2020 at 09:46, Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > Hello,
> >
> > Recent changes to PyArrow seem to have taken the stance that comparing
> > null values should return null.
>
>
> Small note that it is not a *very* recent change (
> https://github.com/apache/arrow/pull/5330, ARROW-6488
> <https://issues.apache.org/jira/browse/ARROW-6488>), at least for scalars
> (I am supposing you are talking about scalars in specific? Or also about
> array equality?).
>
>
> > The problem is that it breaks the
> > expectation that comparisons should return booleans, and perculates into
> > crazy behaviour in other places.
>
>
> It's certainly true that not returning a boolean from __eq__ gives a whole
> set of surprising/strange behaviour, but doing so would also introduce an
> inconsistency with array equality (where nulls propagate in comparisons).
> I think this might boil down to different "expectations" about what __eq__
> should do or is used for:
>
> 1) Be the scalar equivalent of element-wise array equality (Array.__eq__).
> Since right now nulls propagate in comparisons, this would lead to
> Scalar.__eq__ also to return null (and not True/False) if one of the
> operands is null. The null propagation of array comparisons can also be
> discussed of course.
> 2) Be a general equality check of the two objects (matching the
> ".equals(..)" method). For example, Array.equals considers nulls at the
> same location in the array as equal.
> (however, for this second case it's actually not fully clear if nulls
> evaluate to True or False ..)
>
>  Some other inline comments about the the examples:
>
> Here is an example of such
> > misbehaviour in the scalar refactor PR:
> >
> > >>> import pyarrow as pa
> > >>> na = pa.scalar(None)
> > >>> na == na
> > <pyarrow.NullScalar: None>
> >
> > >>> na == 5
> > <pyarrow.NullScalar: None>
> >
> > >>> bool(na == 5)
> > True
> >
>
> This could also be changed to raise an error instead. And that way, the
> next example would also raise (requiring the user to be specific on how to
> handle a scalar null: is it truthy or falsey?)
>
>
> > >>> if na == 5: print("yo!")
> > yo!
> >
> > >>> na in [5]
> > True
> >
> > But you can see it also with arrays containing null values:
> >
> > >>> pa.array([1, None]) in [pa.scalar(42)]
> > True
> >
> > Note that this one doesn't follow from the scalar behaviour (the same can
> be seen on master), but from the Array equality returning a boolean array
> instead of a single boolean value (I changed this recently in
> https://github.com/apache/arrow/pull/7273, also up for discussion of
> course).
> This is because we let any array evaluate to True:
>
> >>> bool(pa.array([False]))
> True
>
> which we might want to change to raising an error similarly as numpy does
> if we keep the element-wise __eq__.
>
> I think that Python equality operators should behave in a
> > Python-sensible way (return True or False).  Have people call another
> > method if they like the fancy (or noxious, depending on the POV)
> > semantics of returning null when comparing null with anything.
> >
>
> It would certainly be an option to reserve __eq__ to general object
> equality (always True/False), and have users call an explicit compute
> function for one of the element-wise comparison operations (eg
> pyarrow.compute.equals).
>
>
> >
> > (note that Numpy doesn't have null scalars, so it can be less
> > conservative in its customization of equality methods)
> >
> > Regards
> >
> > Antoine.
> >
>

Reply via email to