On 25 Apr 2022, at 22:07, Rowan Tommins wrote: > On 25/04/2022 10:33, Craig Francis wrote: > >>> The fact that internal functions have parameter parsing behaviour that is >>> almost impossible to implement in userland, and often not even consistent >>> between functions, is a wart of engine internals, not a design decision. >> Bit of a tangent, but do you have some examples? would be nice to clean some >> of these up, or at least have them in mind as we discuss this RFC. > > > Fundamentally, the internal parameter handling system (ZPP) is completely > separate from the way function signatures work in userland, and evolved based > on a different set of requirements. The emphasis of ZPP is on unwrapping zval > structs to values that can be manipulated directly in C; so, for instance, it > has always had support for integer parameters. Since 7.0, userland signatures > have evolved an essentially parallel set of features with an emphasis on > designing a consistent and useful dynamic typing system. > > Increasingly, ZPP is being aligned with the userland language, which also > allows reflection information to be generated based on PHP stubs. For > instance: > > * Making rejected parameters throw TypeError rather than raise a Warning and > return null > * Giving optional parameters an explicit default in the signature rather than > inspecting the argument count > * Using union types, rather than ad hoc if/switch on zval type > > The currently proposed change to how internal functions handle nulls in 9.0 > is just another part of that process - the userland behaviour is > well-established, and we're making the ZPP behaviour match. > > Off the top of my head, I don't know what other inconsistencies remain, but > my point was that in every case so far, internal functions have been adapted > to match userland, not vice versa.
Thank you Rowan, that's really helpful. >> So I'll spend 1 more... I think it's fair to say that developers using >> `strict_types=1` are more likely to be using static analysis; and if >> `strict_types=1` is going to eventually disappear, those developers won't >> lose any functionality with the stricter checking being done by static >> analysis, which can check all possible variable types (more reliable than >> runtime), and (with the appropriate level of strictness) static analysis can >> do things like rejecting the string '5' being passed to an integer parameter >> and null being passed to a non-nullable parameter. > > > There's an unhelpful implication here, and in your discussion of testing, > that PHP users can be divided into two camps: those who check program > correctness with static analysis tools, unit tests, etc; and those who don't > care about program correctness. > > Instead, how about we think about those who are writing new code and want PHP > to tell them early when they do something silly; and those who are > maintaining large code bases and have to deal with compatibility problems. > Neither of these groups is helped enough by static analysers - as you've > rightly pointed out elsewhere, static checks are *not* reliable in a dynamic > language, and are not likely to be built-in any time soon. > > I'm by no means the strongest advocate of strictness in PHP - I think there > is a risk of throwing out good features with the bad. But I would love to see > strict_types=1 become unnecessary - not because "everyone's running static > analysers anyway, so who cares" but because the default behaviour provides a > good balance of safety and usability. > > That makes me very hesitant to use the strict_types modes as a crutch for > this compatibility break - it only puts off the question of what we think the > sensible behaviour actually is. Yep, I agree, and now I know what George is planning, with the gradual removal of `strict_types=1`, you're right that should not be how the two groups are defined. I would prefer PHP itself to be strict enough to identify and complain about actual mistakes/problems, but be tolerant and allow reasonable forms of type coercion (e.g. your example with a string holding an integer being coerced to an actual integer as needed). That's where I see static analysis providing strict checking for those who want it (e.g. they can choose to not allow any coercion). The only issue I have is when NULL is passed to functions like urlencode(), I do not see that as a problem, and I think NULL coercion should continue to also work in that context (I really don't see why it justifies a fatal Type Error for everyone, as not everyone treats NULL as an invalid value). I've made some tweaks to my RFC in an attempt to better reflect that. >> Thank you; and you're right, if you write new code today, you could do that, >> but that assumes you don't need to tell the difference between an empty >> value vs a missing value > > > As I've said multiple times now, as soon as you pass it to a function that > doesn't have specific handling for nulls, you lose that distinction anyway. > There is literally zero difference in behaviour between "$foo = > htmlspecialchars($_GET['foo'] ?? null)" and "$foo = > htmlspecialchars($_GET['foo'] ?? '')". You're right that changing the NULL coalesce at that point would be fine, because the value is only going to htmlspecialchars() - it's why I think automated tools will have much more luck taking this approach (even if it seems redundant) But we're usually looking at variables for user input (often nullable), that are set earlier in the script, then that nullable variable is used multiple times later. Forgive this primitive example, but this shows `$name` being used in three different ways, where an automated tool cannot simply change line 1 so it doesn't return a NULL, because it would break the Register link. ``` $name = $request->get('name'); if (trim($name) === '') { // Contains non-whitespace characters; so not "", " ", NULL, etc $where_sql[] = 'name LIKE ?'; $where_val[] = $name; } echo ' <form action="./" method="get"> <label> Search <input type="search" name="name" value="' . htmlspecialchars($name) . '"> </label> <input type="submit" value="Go"> </form>'; if ($name !== NULL) { $register_url = '/admin/accounts/add/?name=' . urlencode($name); echo ' <p><a href="' . htmlspecialchars($register_url) . '">Add Account</a></p>'; } ``` In this example, why does `trim()` and `htmlspecialchars()` justify a fatal Type Error? > Telling users when they've passed null to a non-nullable parameter is > precisely about *preserving* that distinction: if you want null to mean > something specific, treating it as a string is a bug. I don't think that represents a bug, are we are talking about a system that takes user input (so often nullable), supports coercion with other simple types, and supports NULL coercion in other contexts (e.g. string concat). >> But, updating existing code, while that would make automated updates easier, >> it's likely to cause problems, because you're editing the value source, with >> no idea about checks later on (like your example which looks for NULL)... >> and that's why an automated update of existing code would have more luck >> updating the sinks rather than the sources (e.g. it knows which sinks are >> expecting a string, so it can add in a `strval($var)`, or `(string) $var`, >> or `$var ?? ""`). > > > That's a fair point, although "sinks" are often themselves the next "source", > which is what makes static analysis possible as often as it is. Yep, it can be complicated... and that's why I like your "$foo ?|> htmlspecialchars(...)" suggestion, because existing code (like the example above) expects functions like trim() to always return a string (rather than returning a NULL when provided with a NULL), and that will continue to work as expected/documented, but you get to introduce a new syntax to allow NULL to propagate though expressions. > Despite all of the above, I am honestly torn on this issue. It is a > disruptive change, and I'm not a fan of errors for errors' sake; but I can > see the value in the decision made back in 7.0 to exclude nulls by default. Thanks Rowan, I'll just add that I think the fatal Type Error for NULL will be much more disruptive, and I would rather relax that requirement for user defined functions, so NULL coercion works in all contexts. Craig -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php