[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation

Andrew Barnert via Python-ideas Thu, 05 Mar 2020 10:41:24 -0800

On Mar 5, 2020, at 05:24, Richard Damon <[email protected]> wrote:
> 
> On 3/4/20 11:07 PM, Andrew Barnert via Python-ideas wrote:
>>> On Mar 4, 2020, at 19:12, Richard Damon <[email protected]> wrote:
>>> Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 
>>> 'Really Truly a Total Order', the first allowing the small exception of a 
>>> very limited (maybe only one) special value that breaks the true definition 
>>> of Total Order so that we could call Floats to be an Almost Total Order.
>> The obvious answer is to have PartialOrder < NaNOrder < TotalOrder, where a 
>> type is NaN-ordered if it’s partially ordered, and its subset of all 
>> self-equal objects is totally ordered.
>> 
>> I can’t imagine when a generic AlmostTotalOrder that didn’t specify how it 
>> fails could be useful; a NaNOrder is useful all over the place. A median 
>> that ignores NaN values requires NaNOrdered input. Even a median that does 
>> something stupid with NaN—see below.
>> 
>> Of course there might be other kinds of almost total orders that might be 
>> useful. If anyone runs into one, they can write their own ABC with the 
>> proper relationships to the stdlib ones. Or even propose it for the stdlib. 
>> But they couldn’t do anything useful with a general AlmostTotalOrder if it 
>> had to handle both NaN and whatever their different case was.
> I think your NaNOrder is my Almost Total Order, except that the name NaNOrder 
> implies that it is only NaN is the special case, and thus we are dealing with 
> numbers, but there may well be some other user type that has a similar 
> 'exceptional' value with properties like NaN, but since the items aren't 
> numbers, the Not-A-Number tag isn't really appropriate.


The important thing is that it implies that there’s a subset of values that are 
totally ordered, and that the only values outside that subset are not 
self-equal. This means, for example, that you can test whether set is actually 
orderable in linear time with all(x==x for x in xs). So you can write a median 
that skips NaN values, or raises on them, just by checking each value as it 
comes up. But your almost total order only implies that there are some values 
that break the total order in some way. So you can only test whether a set is 
actually orderable in quadratic time with all(not(x<y and y<x) for x in xs for 
y in xs)). The only way to raise in bad values is that quadratic check. There 
is no way to skip bad values, because even when the check fails, you can’t tell 
whether it’s x or y that’s bad. So it’s a useless notion.

The other important thing is that NaN order comes up all the time when sorting 
float, Decimal, Pandas datetime (its NaT has NaN semantics), etc. Of course you 
could imagine other almost total orders, even if nobody ever comes up with one 
when asked. Here’s one: the projective reals (real values extended with a 
single infinity that’s both negative and positive) are almost totally ordered, 
except that inf is not comparable to anything but itself (which it’s equal to). 
You could easily write a sort or median or whatever algorithm that does 
something useful with this order. But only if you know it’s this order. You 
can’t write something that’s useful with both this order and NaN order. So 
almost total still doesn’t help with anything; you need to write a new ABC 
that’s also a subclass of Partial and superclass of Total but has no relation 
to NaN.

>>> The presence of NaN in the float system does add a significant complexity 
>>> to dealing with floating point numbers. Sometimes I wonder if since Python 
>>> supports dynamic typing of results, might not do better by removing the NaN 
>>> value from Floats and Decimals, and make the operations that generate the 
>>> NaN generate an object of a special NaN type.
>> The thing is, in most cases you’d be duck typing and get no benefit, because 
>> float and floatnan both have the same methods, etc. Sure, when you’re 
>> type-checking with isinstance you could choose whether to check 
>> float|floatnan or just float, but how often do you write such type checks? 
>> And if you really want that, you could write a Float ABC that does the same 
>> thing, although IIRC ABCs only have a subclasshook rather than an 
>> instancehook so you need a new metaclass:
>> 
>>     class FloatMeta(ABCMeta):
>>         def __instancecheck__(self, instance):
>>             return isinstance(instance, float) and not math.isnan(instance)
>> 
>>     class Float(metaclass=FloatMeta):
>>         pass
>> 
>> People have experimented a lot with similar things and beyond in static type 
>> systems: you can define a new type which is just the non-NaN floats, or just 
>> the finite floats, or just the floats from -1 to 1, or just the three floats 
>> -1 or 0 or 1, and the compiler can check that you’re always using it safely. 
>> (If you call a function that takes one of those non-NaN floats with a float, 
>> unless the compiler can see an if not isnan(x) or a pattern match that 
>> excludes NaN or something else that guarantees correctness, it’s a 
>> TypeError.) Fitting this into the type algebra (& and | and -) is pretty 
>> easy. I’m not aware of anyone translating that idea to dynamic type systems, 
>> but it could be interesting.
> 
> The problem with trying to make NaN a special type in a static typed language 
> is that suddenly you can't define any of your variables to be of type 'float' 
> if they might have a NaN value, as that would have the wrong type to store it.

That’s not a problem in a static typed language unless the language’s type 
system isn’t powerful enough to help. If you think of C and C++ (without 
template metaprogramming) as exemplars of static typing, you really should go 
learn TypeScript or Scala, or at least Swift or C# or Rust. It will open your 
eyes about what type systems can do (and some of those insights are 
transferable to implicit dynamic type systems like Python’s).

Let’s break down float like this (for simplicity, pretend that there’s only a 
single NaN value):

    floatnan = NaN
    floatnum = float - floatnan
    floatinf = -inf | inf
    floatzero = 0
    floatfinite = floatnum - floatinf - floatzero

Now, you know that floatfinite + floatfinite gives you a floatnum, and so does 
floatnum + floatfinite, and so on, but floatnum + floatnum only gives you 
float. You write up all those rules, and now the compiler can verify that all 
of your variables have the right type. And it can also verify this:

    def spam(x: floatfinite) -> floatfinite:
        ...

    def eggs(x: float) -> auto:
        return 1

    x: float
    y: auto
    ...
    if isfinite(x):
        y = spam(x)
    else:
        y = eggs(x)

    z: floatfinite = spam(y)

And so on.

You don’t need any runtime type checks. You don’t even need the types to exist 
at runtime—all of these values can be stored as just a 64-bit C-style double. 
You also don’t need value checks at every operation—in this example, we only 
needed a single value check (which was needed anyway for the algorithm to be 
correct) for the compiler to verify that our code is correct. If we screwed up 
and called spam(x) at the top level, the compiler would give us a TypeError 
telling us that we’re calling a function that requires a floatfinite with a 
float that can’t be proven to be a floatfinite.

And this isn’t a fantasy. There are real experimental languages that do this. 
There are real actually-used-in-production-code languages like TypeScript and 
Haskell that do the equivalent for smaller cases like small integer ranges (and 
proposals for ways to do it for floats by adding support for continuous ranges 
or huge ranges or subtraction types or … but as far as I know none of them have 
panned out yet). The fact that C++ can’t do it isn’t a limitation of static 
typing, it’s a limitation of the C++ type system.

In fact, even C++ can technically do it, but it requires a huge mess of hideous 
template metaprogramming code. There’s a proof of concept out there for 
checking bytes for overflow that requires thousands of lines of unreadable 
generated code and takes minutes to compile, but it works. The equivalent trick 
for doubles would be technically valid C++ code, but it would take 
O(2**(2**64)) compile time memory.

> Your basic fundamental float type suddenly needs to use the dynamic typing 
> system within the static type system that handles type hierarchies.

No it doesn’t. At runtime, all of those float subset types compile to the same 
64-bit IEEE double with no extra information, and all operations on them work 
the same as in C (in fact, even simpler and faster than in C, because you can 
eliminate a few value checks that C requires), because the whole point of the 
type system is that it proved that your code’s assumptions are correct and the 
value checks you wrote are sufficient so nothing needs to be checked at runtime.

> That would be a BIG efficiency hit in a language like C++.

Sure, but again, only in a language like C++, not in a language with a better 
type system.

Also, even if that efficiency hit were ubiquitous, it would still be faster 
than, say, Python (which has to do the same boxing and unboxing and dynamic 
checking, but also has to look up operations by name and interpret bytecode), 
which is hardly an unusable language.

So, the interesting question is: could any of this be leveraged by a powerful 
dynamic type system? Ideally you could make a runtime that can skip the 
checking and/or dispatch costs in some cases. If not, maybe you can at least 
borrow some of the same ideas for defining single-value, range-value, union, 
intersection, and subtraction types for clarity, even if they can’t help the 
compiler optimize anything.

In particular, would being able to check isinstance(x, floatnum) or floatnan 
instead of just float (which is floatnum|floatnan) be useful? Then write the 
ABCs and use them. The fact that they’re not quite ABCs, so you need a custom 
metaclass to do it in Python, is a bit annoying, but only a bit; it’s still 
doable.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/BQZGYZZA44RD7XQA7E7QM6WUPYA6Z2IL/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation

Reply via email to