Re: [PHP-DEV] Revisiting Userland Operator Overloads

Larry Garfield Sat, 07 Aug 2021 15:29:16 -0700

On Sat, Aug 7, 2021, at 3:07 PM, Jordan LeDoux wrote:
> > a) Treating operators as arbitrary symbols, which can be assigned any
> operation which makes sense in a particular domain.
> > b) Treating operators as having a fixed meaning, and allowing custom
> types to implement them with that meaning.
> 
> I think this is the core design choice that will affect how the
> implementation is approached, and having some good discussion around it
> before I got into the implementation was the goal of this thread. :) Jan's
> proposal for 8.0 fell more into the a) category with each symbol being
> given an independent, unrelated, and unopinionated override. That RFC very
> nearly passed, the vote was 38 for and 28 against.
> 
> My one hesitation in pushing for a b) type implementation right now (which
> I favor slightly personally) is that the basic math operators do have very
> different meanings between arithmetic, matrix/vector math, and complex
> numbers, all of which are in the same domain of "math". Granted, only
> objects which represent a number valid for arithmetic could also be used
> with other math functions in PHP (such as the sqrt() or cos() functions).
> However, they are definitely use cases that are well treaded in userspace
> code and libraries.
> 
> Complex numbers, for example, couldn't implement a __compare() function at
> all, as they don't have any consistent and sensical definition of "greater
> than" or "less than". This means that if an object represented a complex
> number, the following code would be perhaps unexpected to some:
> 
> if (10 < $complex) {
>     // Never gets here
> }
> 
> if (10 > $complex) {
>     // Never gets here
> }
> 
> if (10 == $complex) {
>     // Never gets here (!!)
> }
> 
> $comparison = 10 <=> $complex; // Nonsensical, should throw an exception
> 
> So while I tend to lean more towards a b) type implementation myself, even
> within that I understood there to be some non-trivial considerations.
> "Numbers" in PHP are obviously real numbers, instead of matrices or
> complex, so all previous semantics of operators and math functions would
> reflect that. To me, an ideal implementation of operator overloading would
> be both:
> 
> 1. Flexible about the contextual meaning of a given operator.
> 2. Somewhat opinionated about the semantical meaning of an operator.
> 
> This is obviously challenging to accomplish, which is why I'm leaving
> myself nearly a whole year for discussion and implementation. I don't want
> to do this quickly and end up with something that gets accepted because we
> want some form of operator overloading, or something that gets rejected
> again despite putting in a great deal of work.
> 
> Jordan


Side note: Please remember to bottom-post.

I think Rowan's breakdown is a bit too pessimistic and binary.  There are 
definitely different possible ways to interpret operator overloading, but IMO 
there is a reasonable middle-ground.

At one end is the most restrictive, which would be clustering all "related" 
overloads together.  That would be something like this:

interface Arithmetic {
  public function __add($arg);
  public function __subtract($arg);
  public function __multiply($arg);
  public function __delete($arg);
}

The intent of clustering like that would be to "force" developers to use it 
only on number-like things.  However, I believe that has a number of problems.

1) What is a number-like thing?  How number-ish does it have to be?

As an example here, time units.  Adding two hour:minute time tuples together to 
get a new time (wrapping at the 24 hour mark) is an entirely reasonable thing 
to do.  But multiplication and division on time doesn't make any sense at all.  
Or, maybe it does but only with ints (2:30 * 3 = 7:30?), kind of, but certainly 
not on the same type.  I'm sure we could come up with an infinite number of 
cases where one or more arithmetic operations are entirely reasonable and 
well-defined, but others are not.

2) We know from experience that it doesn't work.

PHP already has ArrayAccess, which has four methods.  It's extremely common for 
people to implement ArrayAccess and stub out some of the methods with 
exceptions because they don't make sense in context.  I've seen it a bunch, and 
I've done it a bunch myself  ArrayAccess is, basically, operator overloading 
for four different operators: [], [$key], isset(), and unset().  But plenty of 
use cases exist for wanting to do only some of those (eg, a read-only map so 
stub out unset and offsetSet()), and generally speaking, developers have 
responded to that conundrum by saying "screw it, Exceptions for everybody!"

If we went with a combined interface, I am 100% certain we would see people 
implementing Arithmetic and throwing exceptions from __multiply() and 
__divide().

At the other extreme is arbitrary operator definition a la C++.  That would 
look something vaguely like:

class Foo {
  public function __override(+)($arg);
}

That would give the most flexibility to the developer.  On the one hand, this 
appeals to me greatly as within 30 seconds of it passing I would personally 
release an interface like this:

interface Monad {
  public function __override(>>=)(callable $arg): static;
}

And a few more along similar lines.  The downside is that 30 seconds after 
that, 15 other libraries would do the same in subtly incompatible ways, and 
then both Laravel and Symfony would release their own that are incompatible 
with each other, and it would just be a total mess because you would have NFI 
what any given operator is going to do.  Then FIG would try to define a few to 
standardize the madness, would take about 10-12 months to do so, but both 
Symfony and Laravel would go on using their own instead because they're big 
enough that they can do that, and we'll have a mess basically forever.  That is 
what my crystal ball tells me would happen.

So while this approach appeals to me personally, I think in the long run it's 
probably a bad idea.  My understanding is that many people consider C++'s 
adoption of this approach a mistake, although I'm not a C++ developer so cannot 
speak from first hand experience.

The middle-ground is to give each overridable operator a dedicated named method:

interface Addable {
  public function __add($arg);
}

That way, people can opt-in to whatever meaning of "add" they want, but it 
still means that + always must mean "a method called add()".  That provides 
some guidelines as to what you should do with an operator (if you implement 
_add() and have it return an object that contains less of something, there's a 
very strong argument that you're just being stupid and your code is bad), and 
precludes competing custom operators like >>, >>=, etc.  (Much as I would love 
to make use of them.)

This approach also has precedent in PHP, with, I would argue, far greater 
success than mega-interfaces.  Countable and Traversable are very often 
implemented together.  However, they do not have to be.  Sometimes you have 
something iterable that is uncountable (infinite list, lazy list, etc.), or 
something countable that it doesn't make sense to foreach() over.  So you 
opt-in to whichever bits make sense.  You could also separately opt-in to 
ArrayAccess, which sometimes also makes sense and sometimes not.

I would argue that the micro-interface approach has a far better success rate 
in PHP, especially when it comes to "magic" behavior/engine hooks.  If we're 
going to adopt operator overloading, that is the safest middle-ground to take.

(Similarly, there's nothing that forces someone to return an actual count from 
Countable::count().  It has to be an int, but the language would happy let your 
return random_int() if you wanted.  But the vast majority of the time people 
use it responsibly and return an int that makes logical sense in context.)

We also have the advantage now of both union types and intersection types.  
That means if you want to allow your object to add itself, or some other type, 
and behave differently, you can easily do so by defining __add(Foo|string 
$other) and tossing a match() statement into your method body.  (Side note: 
Pattern matching would make that even better.)  Anything you don't explicitly 
allow just type errors for you already.

Conversely, if you want to accept an object that is addable, subtractable, and 
comparable, you can type it exactly like that:

function foo(Addable&Subtractable&Comparabie $var) {}

So the updated type system makes one-off interfaces a lot easier and more 
practical to work with than in the past.

To be fair, this approach would not prevent weirdos like me from implementing 
__add() and using it as a Monadic bind operator or something silly like that.  
However, I believe experience has shown that a combined Arithmetic interface 
wouldn't stop me from doing silly things either, given experience with 
ArrayAccess.

That leaves four remaining questions, which apply in any of the above cases:

1) What operators do we build in overloading for?  I think there are six to 
start with: The 4 arithmetic operators, concat, and compare.  compare should be 
essentially an internalized version of the custom sort function passed to 
usort() and friends.  The others are reasonably self-explanatory.

An interesting possibility I just realized as I was writing this is using 
bitwise operator overloading in combination with Enums.  Would that be "good 
enough" for enum sets?

enum FileAccess: int implements Andable, Orable {
  case Execute = b1;
  case Read = b10;
  case Write = b100;
  case ReadExecute = b11;
  case WriteExecute = b101;
  case ReadWrite = b110;
  case All = b111;

  public function __and(FileAccess $other): FileAccess {
    return self::from($this->value & $other->value);
  }

  public function __or(FileAccess $other): FileAccess {
    return self::from($this->value | $other->value);
  }
}

I don't know if I like that or not, but it's an interesting thought.  I'm not 
sure if negation makes sense to overload.  Once the basic pattern is 
established we could likely add new operators individually fairly easily.

2) None of these approaches resolves the commutability problem.  There is no 
guarantee that $a + $b === $b + $a, if $a or $b are objects that implement 
Addable.  I suspect that problem is fundamentally intractable, and if we want 
operator overloading we'll just have to suck it up and accept that we cannot 
guarantee that is always the case.  For some that may be a fatal problem, which 
is fair.  It's not a fatal problem for me, personally.

3) Should the methods in question be dynamic or static?  In my mind, the only 
argument for static is that it makes it more likely that they'll be implemented 
in an immutable way, viz, you'll return a new instance of the object rather 
than modifying either $this or $other.  However, there is no guarantee of that 
at all.  A static method has just as much access to private variables of its 
own class as a normal method does, so nothing would prevent a static method 
from modifying one or both of its operands even if we say not to.  That's the 
same as for a normal method.  I think the best we can do in either case is to 
document "please please don't modify the object in place" and move on.  For 
that reason I would favor a normal method, as a static method just makes things 
more complicated.

4) What if any type enforcement should the language force?  Eg, should __add() 
be required to return static, or do we leave that up to the implementer, as 
there are likely use cases we're not thinking of?  If the engine can handle it 
I would favor following the pattern of __invoke(): Let the implementer do 
whatever it wants for both params and return, but an interface can mandate 
__invoke() (or __add()) with certain parameter and return types if it wants.

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Revisiting Userland Operator Overloads

Reply via email to