Re: [PHP-DEV] Zephir, and other tangents

Hammed Ajao Wed, 11 Sep 2024 13:44:54 -0700

On Wed, Sep 11, 2024 at 1:13 PM Mike Schinkel <m...@newclarity.net> wrote:

> Hi Rowan,
>
> > On Sep 11, 2024, at 2:55 AM, Rowan Tommins [IMSoP] <imsop....@rwec.co.uk>
> wrote:
> > Perhaps you're unaware that classes in core already can, and do, provide
> operator overloading. GMP is the "poster child" for it, overloading a bunch
> of mathematical operators, but the mechanism it uses to do so is reasonably
> straightforward and available to any extension.
>
> I was making an (evidently) uninformed assuming that it was non-trivial to
> add operator overloading at the C level. If it is easy, then my comments
> were moot.
>
> That said, writing extensions in C and deploying them is non-trivial
> —comparing to writing code in PHP— so there is that. ¯\_(ツ)_/¯
>
> > I've never liked that approach, because it means users can't write
> polyfills, or even stub objects, that have these special behaviours. It
> feels weird for the language to define behaviour that isn't expressible in
> the language.
>
> Understood. In _general_ I don't like it either, but I will use as an
> analogy a prior discussion regarding __toArray, and I quote[1]:
>
> "For the "convertible to array" case, I think __toArray, or an interface
> specifying just that one method, would make more sense than combining it
> with the existing interfaces. I'm sceptical of that concept, though,
> because most objects could be converted to many different arrays in
> different circumstances, each of which should be given a different and
> descriptive name."
>
> I am of course quoting you.
>
> Similarly, operators could mean different things, e.g. it is possible to
> have different meaning of equal, and even different meanings of plus. Or
> worse be applied in ways that are non-sensical to anybody but the developer
> who implements them (that would be the same kind of developer who names
> their variables after Game of Thrones characters.)
>
> That is why I am not a fan of operator overloading, just as you were not a
> fan of __toArray which to me is less problematic than overloaded operators
> because it has such smaller scope and is actually quote useful for a common
> set of use-cases regardless of the potential for confusion. But I digress.
>
> > It also risks conflicting with a future language feature that overlaps,
> as happened with all native functions marked as accepting string
> automatically coercing nulls, but all userland ones rejecting it.
> Deprecating that difference has caused a lot of friction.
>
> That is a little different in that it was a behavior that occurred in both
> core and userland whereas only allowing operator overloading in core would
> mean there would be not userland differences that could conflict.
>
> Whatever the case, if there are only two options: 1.) no operator
> overloading, and 2.) userland operator overloading I would far prefer the
> former.
>
> > This is the tricky part for me: some of the things people want to do in
> extensions are explicitly the kinds of thing a shared host would not want
> them to, such as interface to system libraries, perform manual memory
> management, interact with other processes on the host.
> >
> > If WASM can provide some kind of sandbox, while still allowing a good
> portion of the features people actually want to write in extensions, I can
> imagine that being useful. But how exactly that would work I have no idea,
> so can't really comment further.
>
> WebAssembly has a deny-by-default design so could be something to
> seriously consider for extensibility in PHP. Implementations start with a
> full sandbox[2] and only add what they need to avoid those kinds of
> concerns.
>
> Also, all memory manipulations sandboxed, though there are still potential
> vulnerabilities within the sandbox so the project that incorporates WASM
> needs to be careful.  WASM written in C/C++ can have memory issues just
> like in regular C/C++, for example.  One option would be to allow only
> AssemblyScript source for WASM. Another would be a config option that a
> web-host could set to only allow signed modules, but that admittedly would
> open another can of worms.  But the memory issues cannot leak out of the
> module or affect other modules nor the system, if implemented with total
> memory constraints.
>
> That said, web hosts can't stop PHP developers from creating infinite
> loops so the memory issues with WASM don't feel like too much bigger of a
> concern given their sandboxed nature.  I've copied numerous other links for
> reference: [4][5][6]
>
>
> >>> The overall trend is to have only what's absolutely necessary in an
> extension.
> >>
> >> Not sure what you mean here.
> >
> > I mean, like Phalcon plans to, ship both a binary extension and a PHP
> library, putting only certain essential functionality in the extension.
> It's how MongoDB ships their PHP bindings, for instance - the extension
> provides low-level protocol support which is not intended for every day
> use; the library is then free to evolve the user-facing parts more freely.
>
> Gotcha.
>
> I think that actually supports what I was saying; people would gravitate
> to only doing in an extension what they cannot do in PHP itself, and over
> time if PHP itself improves there is reason to migrate more code to PHP.
>
> But there can still be reasons to not allow some thing in userland. Some
> things like __toArray.
>
> -Mike
>
> [1] https://www.mail-archive.com/internals@lists.php.net/msg100001.html
> [2]
> https://thenewstack.io/how-webassembly-offers-secure-development-through-sandboxing/
> [3] https://radu-matei.com/blog/practical-guide-to-wasm-memory/
> [4]
> https://www.cs.cmu.edu/~csd-phd-blog/2023/provably-safe-sandboxing-wasm/
> [5] https://chatgpt.com/share/b890aede-1c82-412a-89a9-deae99da506e
> [6] https://www.assemblyscript.org/

Using WebAssembly (Wasm) for PHP doesn't make much sense. PHP already runs
on its own virtual machine server-side, so adding another VM (Wasm) would
just introduce unnecessary complexity and overhead. Additionally, would
this be the LLVM or Cranelift variant of Wasm?

For extensions, Wasm would perform even worse than current implementations,
no matter how it's integrated. Presently, I define zif_handler function
pointers that operate on the current execution frame and return value,
triggered when the engine detects an internal function (fbc). This approach
is as direct as it gets.

Suggesting AssemblyScript, especially in this context, seems illogical.
Have you actually worked with WebAssembly and considered performance
implications, or is this based on theoretical knowledge?

Your point about operator overloading doesn't seem valid either. Consider
the following:

```php
class X {
    public function plus(X $that) {}
    public function equals(X $that) {}
}
```

In this case, `plus` could represent any behavior, as could `equals`. If I
wanted to, I could implement `plus` to perform what `equals` does and vice
versa. Should we consider methods broken just because their names can be
arbitrary?

PHP already distinguishes between comparison operators for objects:

```php
<?php
$obj1 = $obj2 = new stdclass;
assert($obj1 === $obj2); // compares object IDs
assert($obj1 == $obj2);  // compares properties
$obj1 = new stdclass;
assert($obj1 !== $obj2);
assert($obj1 == $obj2);
```

`===` compares object IDs, while `==` compares their properties. Beyond
this, there's little reason to apply an operator to an object directly. Why
would you need to call `$user1 + $user2` or similar operations on an
object? What scenario would break by allowing operator overloads?

However, consider a case where comparing just one property of an object
(like a hash) is enough to determine equality. Wouldn't it be great if,
without changing any of the calling code, the engine compared `$this->hash
=== $that->hash` when `$this == $that` is invoked, instead of all
properties? Without operator overloading, I'd have to define an `equals`
method and replace every `$obj == $x` call with `$obj->equals($x)`.

Moreover, operator overloading unlocks new possibilities for algorithm
design. For example, you could define complex mathematical operations on
custom objects, enabling you to express algorithms more concisely and
naturally. Imagine implementing vector addition, matrix multiplication, or
symbolic computation directly in PHP. Instead of verbose method calls like
`$vec1->add($vec2)` or `$matrix1->multiply($matrix2)`, you could use simple
and intuitive syntax like `$vec1 + $vec2` or `$matrix1 * $matrix2`. This is
particularly useful for domain-specific algorithms where overloading
enhances readability and performance.

Operator overloading isn't just about convenience. It opens the door to
more expressive algorithms, better readability, and reduces boilerplate
code, all while maintaining backward compatibility with existing PHP
behavior.

Cheers,
Hammed.

Re: [PHP-DEV] Zephir, and other tangents

Reply via email to