Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Stephen Reay Fri, 23 Feb 2024 00:35:53 -0800


> On 23 Feb 2024, at 06:56, Larry Garfield <la...@garfieldtech.com> wrote:
> 
> On Wed, Feb 21, 2024, at 11:02 PM, Matthew Weier O'Phinney wrote:
>> On Wed, Feb 21, 2024 at 12:57 PM Larry Garfield <la...@garfieldtech.com> 
>> wrote:
>>> After much on-again/off-again work, Ilija and I are back with a more 
>>> polished property access hooks/interface properties RFC.  It’s 99% 
>>> unchanged from last summer; the PR is now essentially complete and more 
>>> robust, and we were able to squish the last remaining edge cases.
>>> 
>>> Baring any major changes, we plan to bring this to a vote in mid-March.
>>> 
>>> https://wiki.php.net/rfc/property-hooks
>>> 
>>> It’s long, but that’s because we’re handling every edge case we could think 
>>> of.  Properties involve dealing with both references and inheritance, both 
>>> of which have complex implications.  We believe we’ve identified the most 
>>> logical handling for all cases, though.
>> 
>> Once again in reading the proposal, the first thing I'm struck by are 
>> the magic "$field" and "$value" variables inside accessors. The first 
>> time they are used, they're used without explanation, and they're 
>> jarring.
>> 
>> Additionally, once you start defining the behavior of accessors... you 
>> don't start with the basics, but instead jump into some of the more 
>> esoteric usage, which does nothing to help with the questions I have.
>> 
>> So, first:
>> 
>> - Start with the most basic, most expected usage for each of reading 
>> and writing properties.
>> - I need a better argument for why the $field and $value variables 
>> exist. Saying they're macros doesn't help those not deep into 
>> internals. As a user, why do they exist?
> 
> For $field, it's not a requirement.  It's mostly for copy-paste convenience.  
> A number of people have struggled on this point so if the consensus is to 
> leave out $field and just use $this->propName directly, we can accept that.  
> They can be re-added if reusable hook packages are added in the future (as 
> noted in Future Scope).
> 
> For $value, it's to avoid boilerplate.  For the majority case, you'll be just 
> operating on an individual value trivially.  Checking it's range, or 
> uppercasing it, or whatever.  Requiring the developer to provide a name 
> explicitly is just extra work; it's much the same as how PHP doesn't require 
> you to pass $this as the first argument to a method explicitly, the way 
> Python and Rust do.  It's just understood that $this exists, and once you 
> learn that it's obvious.
> 
> On the occasions where you do want to specify an alternate name for some 
> reason, or more likely you're providing a wider type, you can.  But in the 
> typical case it would just be one more thing for the dev to have to type out. 
>  This is especially true in what I expect to be a common case, which is 
> promoted constructor arguments with an extra validator set hook on them.
> 
> It also introduces some ambiguity.  If I specify only the name, does that 
> mean I'm widening the type to mixed?  Or just that I'm omitting the name?  If 
> specifying the name is rare, that's not really a big deal.  If it's required 
> in every case, it's a confusion point  in every case.
> 
> In the interest of transparency. for comparison:
> 
> * Kotlin appears to require an argument name, but by convention recommends 
> using "value".
> * Swift makes it optional, with a default name of "newValue".  (Same logic as 
> in the RFC.)
> * C# ... as far as I can tell, doesn't support a custom name at all; it's 
> always called "value", implicitly.
> 
>> Second: you don't have examples of defining BOTH get and set OTHER than 
>> when using expressions for both accessors or a mix. I'm actually 
>> unclear what the syntax is when both are defined. Is there supposed to 
>> be a `;` terminating each? Or  a `,`? Or just an empty line? Again, 
>> this is one of the more common scenarios. It needs to be covered early, 
>> and clearly.
> 
> ... huh.  I thought we had one in there somewhere.  I will add one, thanks.  
> Though to clarify, there's no separator character.
> 
> public string $foo {
>    get {
>        // ...
>    }
>    set {
>        // ...
>    }
> }
> 
>> Third: the caveats around usage with arrays... give me pause. While I'm 
>> personally trying to not use arrays as much as possible, a lot of code 
>> I see or contribute to still does, and the fact that an array property 
>> that uses a write accessor doesn't allow the same level of access as a 
>> normal array property is something I see leading to a lot of confusion 
>> and errors. I don't have a solution, but I worry that this one thing 
>> alone could be enough to prevent the passage of the RFC.
> 
> We completely agree that it's a suboptimal situation.  But as explained, it 
> is the way it is because it's not possible (as far as we can tell) to fully 
> support hooks on array properties.  If you can think of one, please share, 
> because we'd love to make this part better.  I don't like it either, but we 
> haven't found a way around it.  And that caveat doesn't seem like a good 
> enough reason to not support hooks everywhere we actually can.
> 
> 
>> Fourth: the syntax around inheritance is not intuitive, as it does not 
>> work in the same way as the rest of the language. I'm talking about 
>> this:
>> 
>>    public int $x {
>>        get: 2 * parent::$x::get()
>>    }
>> 
>> Why do we need to use the accessors here? Why wouldn't it just be 
>> `parent::$x`?
> 
> Almost.  Ilija spent some time looking into this, and it's possible with 
> caveats.
> 
> First, there's then no way to differentiate between "access parent hook" and 
> "read the static property $x on the parent".  Arguably that's not a common 
> case, but it is a point of confusion.
> 
> The larger issue is that parent::$x can't be just a stand-in for the backing 
> value in all cases.  While supporting that for the = operator is 
> straightforward enough, it wouldn't give us access to ++, --, <=, and the 
> dozen or so other operators that could conceivably apply.  In theory those 
> could all be implemented manually, but Ilija described that as "hundreds to 
> thousands of lines of code" to do, which... is not time or code well spent. 
> :-)  Especially as this is a very edge-case situation to begin with.  (In 
> practice, I expect auto-generated ORM proxy code to be the primary user of 
> accessing a parent property from a child hook.  I can think of very few other 
> cases where I'd want to use it.)
> 
> So we have the choice between making $a = parent::$prop and parent::$prop = 
> $a work, but *nothing else*, inexplicably (creating confusion) or the 
> slightly longer syntax that wouldn't support those other operations anyway so 
> there's no confusion.
> 
> We feel the current approach is the better trade off, but if the consensus 
> generally is for the shorter-but-inconsistent syntax, that can be changed.
> 
>> I want to be clear: I really like the idea behind this feature, and 
>> overall, I appreciate the design. From a user perspective, though, the 
>> above are things that I found jarring as they vary quite a bit from our 
>> current language design.
>>> 
>>> Note the FAQ question at the end, which explains some design choices.
>>> 
>>> There’s one outstanding question, which is slightly painful to ask: 
>>> Originally, this RFC was called “property accessors,” which is the 
>>> terminology used by most languages.  During early development, when we had 
>>> 4 accessors like Swift, we changed the name to “hooks” to better indicate 
>>> that one was “hooking into” the property lifecycle.  However, later 
>>> refinement brought it back down to 2 operations, get and set.  That makes 
>>> the “hooks” name less applicable, and inconsistent with what other 
>>> languages call it.
>>> 
>>> However, changing it back at this point would be a non-small amount of 
>>> grunt work. There would be no functional changes from doing so, but it’s 
>>> lots of renaming things both in the PR and the RFC. We are willing to do so 
>>> if the consensus is that it would be beneficial, but want to ask before 
>>> putting in the effort.
>> 
>> I personally would go with just "accessors". 
> 
> On Thu, Feb 22, 2024, at 12:06 AM, Deleu wrote:
> 
>> This is a long, huge and comprehensive work, congratz to the authors.
>> 
>> It clearly shows that so much thought and work has been put into it 
>> that it makes me cautious to even ask for further clarification. 
>> 
>>> Javascript have similar features via a different syntax, although that 
>>> syntax would not be viable for PHP
>> 
>> Why not?
> 
>> It feels quite a natural syntax for PHP and from someone oblivious to 
>> the internal work, it appears to be a slight marginal change to the 
>> existing RFC. Given the extensive work of this RFC, it seems pretty 
>> obvious that this syntax will not work, I just don't know why.
> 
> I've added an FAQ section explaining why the Python/JS approach wouldn't 
> really work.  To be clear, Ilija and I spent 100+ hours doing research and 
> design before we started implementation (back in mid-late 2022).  We did 
> seriously consider the JS-style syntax, but in the end we found it created 
> more problems than it solves.  For the type of language PHP is (explicit 
> typed properties), doing it on the property itself is a much cleaner approach.
> 
> On Thu, Feb 22, 2024, at 1:14 AM, Robert Landers wrote:
> 
> 
>> I apologize if this has already been covered:
>> 
>>> There are two shorthand notations supported, beyond the optional argument 
>>> to set.
>>> First, if a hook's body is a single expression, then the { } and return 
>>> statement may be omitted and replaced with =>, just like with arrow 
>>> functions.
>> 
>> Does => do any special auto-capturing of variables like arrow
>> functions or is it just a shorthand? 
> 
> No, there is nothing to capture.  Inside the hook body you have $this the 
> same as any other method.
> 
>> Also, is this a meaningful
>> shorthand to the example a little further down:
>> 
>> public string $phone {
>>    set = $this->sanitizePhone(...);
>> }
>> 
>> or do we always have to write it out?
>> 
>> public string $phone {
>>    set => $field = $this->sanitizePhone($value);
>> }
> 
> Currently no, the set is always void; you have to write the value yourself.  
> See the FAQ section on the topic for a detailed explanation.
> 
> However, I just had a long discussion with Ilija and there is one possibility 
> we could consider: Use the return value only on the shorthand 
> (arrow-function-like) syntax.
> 
> So you could do either of these, which would be equivalent:
> 
> set {
>  $this->phone = $this->santizePhone($value);
> }
> 
> set => $this->santizePhone($value);
> 
> This would have the advantage of offering a return-to-set mechanism, as well 
> as even shorter syntax in the simple case (and no question of $field vs 
> $this->propName).  But it would have the disadvantage of being potentially 
> inconsistent between the short and long version.  It also would mean the 
> short version is incompatible with virtual properties; using a short-set 
> would create a backing value, so it's non-virtual.  But since "simple 
> validation, maybe in a promoted constructor property" is likely to be one of 
> the main use cases, it would simplify that case.
> 
> Not sure if that's a trade up or not.  Thoughts from the list?
> 
>> Would __PROPERTY__ be set inside sanitizePhone() as well?
> 
> No.  Like __CLASS__, it is materialized at compile time and has no meaning 
> outside of its intended scope.
> 
>> You mention several ways values are displayed (whether or not they use
>> the get-hook), but what does the default implementation of
>> `__debugInfo()` look like now (or is that out of scope or a silly
>> question?)
> 
> var_dump() shows the underlying backing value, bypassing get, since it's 
> debugging the object state.  If you implement __debugInfo(), that's a method 
> like any other and you can do what you like, though you'll be reading through 
> get hooks (just like using __serialize()).
> 
>> For attributes, it would be nice to be able to target hooks
>> specifically with attributes instead of also all methods (e.g., a
>> Attribute::TARGET_GET_HOOK const). For example, if I were writing a
>> serialization library, I may want to specify #[UseRawValue] only on
>> getters to ensure that only the raw value is serialized instead of the
>> getter (which may be specific to the application logic, or
>> #[GetFromMethod] to tell the serialization library to get the value
>> from a completely different method. It wouldn't make sense to target
>> just any method with that attribute.
> 
> This feels very niche, honestly.  Those would naturally have to be sub-cases 
> of TARGET_METHOD anyway, so method-targeted attributes would need to be 
> supported regardless.  That makes hook-specific-targeting an easy and 
> non-breaking add-on for a future RFC if it turns out to be useful in practice.
> 
> --Larry Garfield


Hi Larry,

It's good to see this idea still progressing.


I have to agree with the other comment(s) that the implicit `$field`/`$value` 
variables seem odd to me. I understand the desire for brevity and the potential 
future scope of reused hooks, but this concept seems to fly in the face of many 
years of PHP reducing "magic" like this.

To give one answer to your question about ambiguity if the `$value` parameter 
is required - I don't believe this is actually ambiguous, in the context of PHP:
- method parameters in child classes don't implicitly 'inherit' the parent 
method parameter's type if they don't define one (they widen to mixed);
- method return types have no implicit inheritance, they must declare a 
compatible return type;
- typed class properties don't implicitly inherit the parent type when the type 
left off a child property - they must declare the same type.

AFAIK there is no existing behaviour in PHP where omitting a type would mean 
"the type is implicitly inherited from X", it either means the same as mixed, 
or it's an error.


The use of implicit variables like this also opens up a new potential for bugs, 
particularly in the `set` block, where the difference between setting a new 
local variable inside the method body and writing to the backing store is 
indistinguishable without knowing intent. Explicitly writing to 
`$this->propertyName` (or `$this->{__PROPERTY__}` in a hypothetical 
re-used/generated hook) is certainly longer, but it's also much less likely to 
suffer from a typo that is then missed by a test, especially given that dynamic 
properties are now deprecated, so writing to `$this->porpertyName` from a hook 
on `$propertyName` would likely trigger an error, while writing to `$feild` 
from a hook would produce no error.

Ultimately so long as the syntax *allows* for being explicit (which it seems 
to, and it provides the magic constant allowing for future dynamic use in 
reused hooks) I'm generally positive about this.

Given that these are both essentially deemed as "shortcuts" for convenience 
rather than actual differences in functionality, perhaps whether to include the 
implicit/magic `$field`/`$value` variables could be a secondary question(s) on 
the RFC? 

Also, a small nitpick: The link to your attributeutils repo in the examples 
page, is broken, and it would be nice to see a few examples showing the 
explicit version of the hooks.


Cheers

Stephen

Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Reply via email to