Sorry for the previous prematurely sent email, looks like I found a new keyboard shortcut :)
> -----Original Message----- > From: Anthony Ferrara [mailto:ircmax...@gmail.com] > Sent: Saturday, February 21, 2015 8:12 PM > To: Zeev Suraski > Cc: PHP internals > Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC > > Zeev, > > First off, thanks for putting forward a proposal. I look forward to a > patch > that can be experimented with. > > There are a few concerns that I have about the proposal however: > > > Proponents of Strict STH cite numerous advantages, primarily around code > safety/security. In their view, the conversion rules proposed by Dynamic > STH > can easily allow ‘garbage’ input to be silently converted into arguments > that > the callee will accept – but that may, in many cases, hide > difficult-to-find > bugs or otherwise result in unexpected behavior. > > I think that's partially mis-stating the concern. I don't think it's mis-stating the key concern. At least not based on what I've heard from most people here over the last few months. > It's less about "garbage input" > and more about unpredictable behavior. You can't look at code and know > that it will not produce an error with dynamic typing. That's one of the > big > advantages of strict typing that many people want. In reality the reasons > are > complex, varied and important to each person. Your ability to look at code and know whether or not it will produce errors is very similar in both strict and coercive typing. But that goes back to what we already decided to agree to disagree on - whether or not strict type give you any tangible extra data when you look at code - aka Static Analysis. Note that Strict Typing would produce all the errors of coercive typing and then some. So knowing whether code will produce errors is arguably more difficult in strict typing, although I think that at the end of the day, it's pretty much equivalent. Again, I did see Static Analysis being brought up by just a handful of people, perhaps not even that. For most people, it was the silent acceptance of input that's likely to be invalid. > > Proponents of Dynamic STH bring up consistency with the rest of the > language, including some fundamental type-juggling aspects that have been > key tenets of PHP since its inception. Strict STH, in their view, is > inconsistent > with these tenets. > > Dynamic STH is apparently consistency with the rest of the language's > treatment of scalar types. It's inconsistent with the rest of the > languages > treatment of parameters. Not in the way Andrea proposed it, IIRC. She opted to go for consistency with internal functions. Either way, at the risk of being shot for talking about spiritual things, Dynamic STH is consistent with the dynamic spirit of PHP, even if there are some discrepancies between its rule-set and the implicit typing rules that govern expressions. Note that in this RFC I'm actually suggesting a possible way forward that will align *all* aspects of PHP, including implicit casting - and have them all governed by a single set of rules. > However there's an important point to make here: a lot of best practice > has > been pushing against the way PHP treats scalar types in certain cases. > Specifically around == vs === and using strict comparison mode in > in_array, > etc. I think you're correct on comparisons, but not so much on the rest. Dynamic use of scalars in expressions is still exceptionally common in PHP code. Even with comparisons, == is still very common - and you'd use == vs. === depending on what you need. > So while it appears consistent with the rest of PHP, it only does so if > you > ignore a large part of both the language and the way it's commonly used. Let's agree to disagree. That's one thing we can always agree on! :) > In reality, the only thing PHP's type system is consistent at is being > inconsistent. I'd have to partially agree with you here; But if you read the RFC through including its future recommendations, you'd see it's perhaps the first attempt in 20 years to fix that. Instead of doing that through the introduction of a 3rd (albeit simplistic rule-set that only pays attention to zval.type) - a creation of a single set of rules that will be consistent across the whole language, beginning with userland and internal functions. > In the "Changes To Internal Functions" section, I think all three types > are > significantly flawed: > > 1. "Just Do It" - This is problematic because a very large chunk of code > that > worked in 5.x will all of a sudden not work in 7.0. This will likely > create a > python 2/3 issue, as it would require a LOT of code to be changed to make > it > compatible. > > 2. "Emit E_DEPRECATED" - This is problematic because raising errors (even > if > suppressed) is not cheap. And the potential for raising one for a > non-trivial > percentage of every native function call has the potential to have a > MASSIVE > performance impact for code designed for 5.x. Without a patch to test, it > can't really be codified, but it would be a shame to lose the performance > gains made with 7 because we're triggering 100's, 1000's or 10000's of > errors > in a single application run... > > 3. "Just Do It but give users an option to not" - This has the problems > that > E_DEPRECATED has, but it also gets us back to having fundamental code > behavior controlled by an INI setting, which for a very long time this > community has generally seen as a bad thing (especially for portability > and > code re-use). I do too, and I was upfront about their cons, not just pros. And yet, they all bring us to a much better outcome within a relatively short period of time (in the lifetime of a language) than the Dual Mode will. > > Further, the two sets can cause the same functions to behave > > differently depending on where they're being called > > I think that's misleading. The functions will always behave the same. > The difference is how you get data into the function. The behavior > difference > is in your code, not the end function. I'll be happy to get a suggestion from you on how to reword that. Ultimately, from the layman user's point of view, she'd be calling foo() from one place and have it accept her arguments, and foo() from another place and have it reject the very same arguments. > > For example, a “32” (string) value coming back from an integer column in > > a > database table, would not be accepted as valid input for a function > expecting > an integer. > > There's an important point to consider here. You're relying on information > outside of the program to determine program correctness. > So to say "coming back from an integer column" requires concrete > knowledge and information that you can't possibly have in the program. > What happens when some DBA changes the column type to a string type. > The data will still work for a while, but then suddenly break without > warning > when a non-integer value comes in. Because the value-information comes > from outside. Of course we're relying on information coming from outside, as we all know, this is one of the most common use cases for PHP. While theoretically you're right, in practice, in the vast majority of cases it wouldn't play out like that. The string column won't be tested exclusively with "123" inputs. As soon as there's a non-numeric-string input, it'll fail. That's likely to happen very early in the process, and that's before considering that if there's such a huge mismatch between the semantic meaning of the column and what the function expects - the problem is likely to be found even sooner, since the function will simply not perform its intended job. On the flip-side, imagine that same developer using strict types. Feeding the function that integer in string form gets rejected. What are her options? The developer is likely to just explicitly cast the value into an int, giving up on any and all sanitization that coercive types would offer her, happily accepting "Apples" and "100 Dalmatians" as valid inputs. That, on the other hand, is a *very* likely scenario. > With strict mode, you'd have to embed a cast (smart or explicit) to > convert to > an integer at the point the data comes in. First, I'm not aware of smart/safe casts being available or proposed at this point. Secondly, why at the point the data comes in? That would be ideal for static analyzers, but it's probably a lot more common that it will be done at the first point in time where it gets rejected. > Additionally, with the dual-mode proposal DB interactions can be in weak > mode and have the exact behavior you're describing here. Giving the user > the > choice, rather than making assumptions. This is bound to be misquoted and used against me, but I don't think it's a good idea to give the user the choice in such a way. I could have sworn that you tweeted the quote about perfection being not when there's nothing left to add, but nothing left to remove, but perhaps it was someone else. Either way, two modes are worse than one, if we can come up with a good single unified mode that addresses *most* cases. Remember you can always implement custom type checking to your heart's content. You can easily implement if (!is_int($foo)) { exit; } in the not-so-common-cases where accepting "42" as 42 might be disastrous. However, on the caller side, forcing people to clutter their code with casts - many casts - either explicit casts or custom ones - is going to affect a lot more developers in a lot more places. The bang for the buck of adding strict mode is just not there, in my humble opinion of course. > > Strict zval.type based STH effectively eliminates this behavior, moving > > the > burden of worrying about type conversion to the user. > > Correct. And you say that as if it's a bad thing. Being explicit about > type > conversions isn't what you'd do in a 10 line-of-code script where you can > realize what the types are by just thinking about it. But on large scale > systems > exposing the type conversions to the user gives the power to actually > understand the codebase when you can't fit the whole thing in your head at > the same time. I have a hard time connecting to the 'power' approach. I think developers want their code to work, with minimal effort, and be secure. Coercive scalar type hints will do an excellent job at that. Strict type hints will be more work, are bound to a lot of trigger "Oh come on" responses, and as a special bonus - proliferate the use of explicit casts. Let me top that - you'd have developers who think they're security conscious, because they're using strict mode - with code that's full of explicit casts. > So what you cite here as a disadvantage many consider to be an advantage. Perhaps, but I used the proper verb at the top ("We believe"). > > It is our position that there is no difference at all between strict > > and coercive typing in terms of potential future AOT/JIT development - > > none at all > > So really what you're saying is that you disagree with me publicly. A > statement which I said on the side, and I said should not impact RFC or > voting > in any way. And is in no part in my RFC at all. Yet brought up again. We listed all what we believe to be misconceptions that were brought up on internals. As recently as yesterday, you had a PHP power user (Larry) that was under the strong impression Strict STH would yield substantial performance benefits. Given that it was claimed in the past, and since we can't assume every voter reads every last word that's written on internals@ threads, it was important to list that here even if it's not mentioned in the Strict/Dual mode RFC. It's also worth mentioning that there are people who *assume* that strict type hints can somehow help performance, without being domain experts at neither the engine nor JIT, even if they weren't exposed to the explicit statements that suggested that on blogs and on internals@ - adding to the importance of making it clear that there are no performance benefits to that approach. > > > Static Analysis. It is the position of several Strict STH proponents > > that Strict STH can help static analysis in certain cases. For the > > same reasons mentioned above about JIT, we don't believe that is the > > case > > This is patently false. It's actually patently true. We don't believe that is the case. QED. While at it, can we stop using that 'patently false', and stick for constructive wording such as 'I disagree'? Also, I think that if you quoted the rest of the sentence you chose to trim, it would appear a lot less confrontational: "Static Analysis. It is the position of several Strict STH proponents that Strict STH can help static analysis in certain cases. For the same reasons mentioned above about JIT, we don't believe that is the case - ***although it's possible that Strict Typing may be able to help static analysis in certain edge cases.***" That's still under 'we (don't) believe', so again, it's "patently true". You can disagree, but that's our opinion. I'll also add the most important part of that paragraph for the sake of completeness: "It is our belief that even if that is true, Static Analyzers need to be designed for Languages, rather than Languages being designed for Static Analyzers." > Keep not believing it all you want, but *static analysis* > requires statically looking at code. Which means you have no value > information. So static analysis can't possibly happen in cases where you > need > to know about value information (because it's not there). Yes, at function > entry you know the types. But static analysis isn't about analyzing a > single > function (in fact, that's the least interesting case). It's more about > analyzing a > series of functions, a function call graph. And in that case strict typing > (based > only on > type) does make a big difference. I think it's fair to say that while we were unable to convince you there's no tangible extra value in Strict STH compared to any other kind of STH that guarantees the type of value a function will get, you were also unable to convince Dmitry, Stas or myself - all of which independently discussed it with you. Again, despite that, I'm not saying that you're "patently wrong", just that I don't believe you're right. Thanks for the feedback! Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php