hi Matthew, On Mon, Feb 23, 2015 at 8:01 AM, Matthew Weier O'Phinney <matt...@zend.com> wrote: > I'm writing this as an author and maintainer of a framework and many > libraries. > Caveat, for those who aren't already aware: I work for Zend, and report to > Zeev. > If you feel that will make my points impartial, please feel free to stop > reading, but I do think my points on STH bear some consideration. > > I've been following the STH proposals off and on. I voted for Andrea's > proposal, > and, behind the scenes, defended it to Zeev. On a lot of consideration, and as > primarily a _consumer_ and _user_ of the language, I'm no longer convinced > that > a dual-mode proposal makes sense. I worry that it will lead to: > > - A split within the PHP community, consisting of those that do not use > typehints, those who do use typehints, and those who use strict. > - Poor programming practices and performance degradation by those who adopt > strict, due to poor usage of type casting. > > Let me explain. > > The big problem currently is that the engine behavior around casting can lead > to > data loss quickly. As has been demonstrated elsewhere: > > $value = (int) '100 dogs'; // 100 - non-numeric trailing values are > trimmed > $value = (int) 'dog100'; // 0 - non-numeric values leading > values -> 0 ... > $value = (int) '-100'; // -100 - ... unless indicating sign. > $value = (int) ' 100'; // 100 - space is trimmed; data loss! > $value = (int) ' 100 '; // 100 - space is trimmed; data loss! > $value = (int) '100.0'; // 100 - probably correct, but loss of > precision > $value = (int) '100.7'; // 100 - precision and data loss! > $value = (int) 100.7; // 100 - precision and data loss! > $value = (int) 0x1A; // 26 - hex > $value = (int) '0x1A'; // 0 - shouldn't this be 26? why is > this different? > $value = (int) true; // 1 - should this be cast? > $value = (int) false; // 0 - should this be cast? > $value = (int) null; // 0 - should this be cast? > > Today, without scalar type hints, we end up writing code that has to first > validate that we have something we can use, and then cast it.
What does that have to do with the strict RFC? If you do not enable it, in your code or files, nothing will change to what you have today. I repeat, nothing. Even if your library (with no strict mode) is used from files/codes with strict mode enabled. On the other hands, if we change the way casting is done, in general and globally, I wish anyone good luck to actually validate their apps. Why? Because I am relatively confident that most of the apps out there have no way to actually test these changes with real input data and I very much doubts their respective unit tests, or behavior tests do cover these cases. > This can often be > done with ext/filter, but it's horribly verbose: > $value = filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false === $value) { > // throw an exception? > } > > Many people skip the validation step entirely for the more succinct: > > $value = (int) $value; You lost me here. Input filtering is one thing. Arguments management another. Yes, they may look similar but really they are two different beasts. Or am I missing your point? > And this is where problems occur, because this is when data loss occurs. > > What I've observed in my 15+ years of using PHP is that people _don't_ > validate; > they either blindly accept data and assume it's of the correct type, or they > blindly cast it without validation because writing that validation code is > boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can > offload that to libraries, but why introduce a new dependency in something as > simple as a value object? Right, and manage to remember what the casting rules are is even more painful and boring, we barely know by heart all of them and I surprised myself about a couple of them while reading the internals list (like fixing inconsistencies or old weird behavior). I am convinced that changing them now is not going to help anyone, in contrary. And this is what the weak typing RFC proposes. > The promise of STH is that the values will be properly coerced, so that if I > write a function that expects an integer, but pass it something like '100' or > '0x1A', it will be cast for me — but something that is not an integer and > cannot > be safely cast without data loss will be rejected, and an error can bubble up > my > stack or into my logs. There is no "Properly" in casting. It is almost only some arbitrary choices. The boolean one for example is just random to me. By the way, this is why I do like to be able to have a strict mode if I wish to: I do not want arbitrary rules, especially if I won't ever remember them. On the "users do not validate input values in their code, for functions or methods, well, it is an education problem. Just like in the core, we always bugs because we do not validate ranges, offset, etc. And know what? C is strict. Nothing change here. Some functions and methods do need validations of the inputs, that does not make strictness less good or more worst. It simply removes the magic casting part and make crystal clear what will happen when invalid types are used.. > Both the Dual-Mode and the new Coercive typehints RFCs provide this. > > The Dual-Mode, however, can potentially take us back to the same code we have > today when strict mode is enabled. Either some coma is missing or I cannot remotely understand where strict mode will take us back. There is not such thing now. And lazy users will remain lazy, how the arguments are handled won't change them magically. > Now, you may argue that you won't need to cast the value in the first place, > because STH! But what if the value you received is from a database? or from a > web request you've made? Chances are, the data is in a string, but the _value_ > may be of another type. With weak/coercive mode, you just pass the data as-is, > but with strict enabled, your choices are to either cast blindly, or to do the > same validation/casting as before: > > $value = filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false === $value) { > // throw an exception? > } > > Interestingly, this adds overhead to your application (more function calls), > and > makes it harder to read and to maintain. Ironically, I foresee "strict" as > being > a new "badge of honor" for many in the language ("my code works under strict > mode!"), despite these factors. See previous comment. > If I don't enable strict mode on my code, and somebody else turns on strict > when > calling my code, You totally misunderstand the RFC. Whether strict mode is enabled or not in the caller code does not affect in any way the code in your library/files. I repeat: You control, and only you!, your file/code, the caller does not and cannot control the mode used in your code/file. Please understand that. > You can say, "But, Static Analysis!" all you want, but that doesn't lead to me > writing less code to accomplish the same thing; it just gives me a tool to > check > the correctness of my code. (Yes, this _is_ important. But we also have a ton > of > tooling around those concerns already, even if they aren't proper static > analyzers.) I totally agree. Hypothetical new tools or features because one or the other RFC is chosen is totally irrelevant to this discussion. Performance as well, as Dmitry mentioned again that strict dual mode will be slower, it is not in any significant way (read: below measures error margins). > From a developer experience factor, I find myself scratching my head: what are > we gaining with STH if we have a strict mode? I'm still writing exactly the > same > code I am today to validate and/or cast my scalars before passing them to > functions and methods if I want to be strict. Fair enough. But let other who do see benefits (see my numerous comments, or from other) use it. On the other, I let you imagine if we change the casting rules now, good luck. > The new coercive RFC offers much more promise to me as a consumer/user of the > language. The primary benefit I see is that it provides a path forward towards > better casting logic in the language, which will ensure that — in the future — > this: > > $value = (int) $value; > > will operate properly, and raise errors when data loss may occur. It means > that > immediately, if I start using STH, I can be assured that _if_ my code runs, I > have values of the correct type, as they've been coerced safely. The lack of a > strict mode means I can drop that defensive validation/casting code safely. Oh, I agree, clean the casting rules is totally necessary. But let forget about adoption, ok? I ignore totally the INI settings about it as it is such a bad idea than I have no word to explain why.... > My point is: I'm sick of writing code like this: > > /** > * @param int $code > * @param string $reason > */ > public function setStatus($code, $reason = null) > { > $code = filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); I suppose you are not sick to write bugs but made a type, s, $value, $code, right? Besides the killing a fly with a tank to do this kind of validations like this, this is mainly due to the magic casting being inconsistent. Both weaks (but creates potential BC problems on the caller side) and strict solve this exact kind of validations. What none of the RFC will change is the business logic related validations, like ranges and the likes. These kind of validations could be easily solve using annotations (and why this is what we need next to scalar type hinting as well). > I want to be able to write this: Cheers, -- Pierre @pierrejoye | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php