[Steven D'Aprano <[email protected]>]
> Sorting doesn't require a total order. Sorting only requires a weak
> order where the only operator required is the "comes before" operator,
> or less than. That's precisely how sorting in Python is implemented.
Let's be careful here. Python's list.sort() guarantees that _if_ the
elements are totally orderable, it will deliver the unique stable
permutation consistent with that ordering. Otherwise, the only thing
it guarantees is that the output will be _some_ permutation of the
input.
There was no attempt to do something "useful" in the absence of a
total order - whatever happens then is purely down to implementation
accidents.
The reason is sticks to __lt__ is pragmatic, not theoretical: at the
time it was designed, rich comparisons weren't yet implemented, and
all comparisons were implemented via __cmp__ methods. But rich
comparisons were scheduled for a future release, and sticking to
__lt__ was my deliberate nod to making it as easy as possible then for
classes that wanted to define their own order to participate in
sorting. Implement __lt__, and you're good to go.
I don't think it's documented, but the guts of the bisect and heapq
modules were reworked (I think by Raymond) to stick to __lt__ too, for
the same reason.
But, e.g.,
>>> sorted([{1, 2}, {3}, {1}])
[{1, 2}, {3}, {1}]
despite that {1} < {1, 2}
As far as sorted() is concerned, that's "garbage in garbage out".
What happens today (and since the current list.sort() was written -
but may not be so tomorrow):
First it asks whether {3} < {1, 2}. It gets back False, so concludes
the first two elements are already in order. Then it asks whether {1}
< {3}. Again False, so it concludes those two are also in order.
And that's it. It's done. It never compares {1, 2} to {1} at all.
By comparing only adjacent pairs of elements, it concludes that the
entire input consists of one non-descending run. For totally ordered
input elements, that means "it's already sorted".
> Here is an interesting discussion of a practical use-case of sorting
> data with a partial order:
>
> https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-am-forking-the-compiler/
list.sort() is of no use for finding a linear ordering that respects a
collection of."A must precede B" constraints But Python 3.9
introduces a capable new functools.TopologicalSorter class that is
good for that. Although, unlike the approach in that blog, it doesn't
cater to cycles - if there's a cycle in the input, it raises
CycleError.
>> but we really need to use a
>> type that has these exceptional values. Imagine that sort/median was
>> defined to type check its parameter,
> No need to imagine it, sort already type-checks its arguments:
>
> py> sorted([1, 3, 5, "Hello", 2])
> TypeError: '<' not supported between instances of 'str' and 'int'
Yes and no ;-) list.sort() makes no attempt to check element types
for the sake of checking them. In the example, it's just passing on
an exception raised by asking whether "Hello" < 5 (as in the example
above, it first compares adjacent pairs to identify the longest
initial run). If, e.g., we sorted a list like [a, b, c], where
(b < a) == (c < b) == False
it would stop then, even if comparing `a` with `c` would raise an exception.
> [snipped stuff about NaNs]
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/IY6C4E3M5472IDPDG24KYNLK2AZNW3DI/
Code of Conduct: http://python.org/psf/codeofconduct/