> -----Original Message----- > From: Larry Garfield [mailto:la...@garfieldtech.com] > Sent: Thursday, February 19, 2015 9:00 AM > To: internals@lists.php.net > Subject: Re: [PHP-DEV] Reviving scalar type hints > > On 02/17/2015 01:30 PM, Zeev Suraski wrote: > >> Yes, I already know that. > At this point, if I could rephrase the "camps" a bit I see two different > sets of > priorities: > > 1) PHP should do what seems "obviously safe" to do, to make life easiest > for > developers. That is, it's patently obvious that "32" and 32 are > equivalent, so > don't make developers worry about the distinction because to them there > isn't one. This is an entirely reasonable position. > > 2) PHP would benefit hugely from static analysis tools and compile-time > type-based optimizations, but those are only possible with code that is > strongly typed. Currently such tools do not really exist, but with > compile- > time-knowlable information could be written and even incorporated into > future versions of PHP without API breaks. (I think Anthony demonstrated > earlier examples of function calls no longer being slow, for instance, if > the > type juggling could be removed at compile > time.) This is an entirely reasonable position.
Larry, There's actually very little difference between coercive type hinting and strict type hinting in terms of performance. If you read what both Dmitry and Anthony said, it should be clear that the vast majority of gains can be had even without any sort of type hinting at all - and as Stas pointed out, JavaScript has some mind blowing JIT optimizations without any explicit type info at all. Moreover, I think it's easy to lose the forest from the trees here, by focusing on a very narrow piece of code - without looking at the bigger picture. Ultimately, if you have a piece of data that you want to pass from a caller to a callee, it could be under one of three labels: 1. A piece of data the callee can use as-is. 2. A piece of data the callee can use after conversion (be it explicit or implicit). 3. A piece of data the callee cannot/shouldn't use. When comparing strict and coercive type hints, there's no difference between them in terms of #1; There's a subtle difference with #3 - but only in the error situation. In other words, for coercive type hints, it would just take a bit more time before they fail, because they have to conduct a few more checks. However, that's an error situation anyway, which is either already going to bail out, or go through error handling code - which would be very slow anyway. So focusing on #2, in a practical real world situation - the difference is actually a lot more subtle than people might think if they only zoom into on the area around parameter passing. The bigger picture is, what would the code author - the one making the call - want to do, semantically? In other words, if you have "32" coming from a database or whatnot, are you likely to want an API that accepts an int to be able to use that? I think the answer is almost always yes. So practically, what will happen with strict typing is that you'd explicitly cast it to int, while with coercive typing - you'd rely on the language to do it for you. Arguably, very little difference between the two in terms of performance. Note that it's possible people will be able to come up with various edge cases where strict typing might somehow alert you to a situation that may push you to change your code in a way it might end up being slightly faster. But those will be edge cases and should be taken in the context - in the vast majority of code patterns, there's zero difference between the two approaches in terms of performance. In terms of functionality, however, there's actually a substantial difference between the two - explicit casting is a lot more aggressive than the coercion rules we're thinking about for coercive type hints. It'll happily and silently coerce "Apple" into 0, "100 dogs" into 100, and 3.1415 into 3. Now, diving back to future potential AOT/JIT, it's simply not true that there's any gain at all from strict typing - or at least, neither Dmitry (who wrote a full JIT compiler for PHP that runs Mandelbrot as fast as gcc does) nor me were able to understand them. Anthony spoke about being able to completely eliminate the zval container and all associated checks, so that in certain situations you'd be able to map a PHP integer all the way down to a C (or asm) integer. That can certainly be done, but it has nothing to do with strict vs. coercive type hints. Here's why: 1. At this point I think it's clear to everyone that inside the called function, there's zero difference between strict and coercive typing (or even the weak typing we were talking about earlier). They're 100% guaranteed to receive what they asked, either because values were coerced or blocked from even making it into the function. 2. On the outside calling code - if you can conduct the level of type inference that would enable you to safely compile a PHP integer into a machine code integer, by all means - do it; While at it, generate slightly different function calling code that would bypass zval type checks altogether, and provide that function with the integer it wanted. Note that in his JIT POC, Dmitry managed to conduct a lot of this without any type hinting *at all*, so while type hints (be them strict/coercive/weak) make this job a bit easier - they're hardly required; Nor do they solve the bigger challenging problem - which is type inference in the various functions' code bodies themselves - since we don't have variable declarations or strong typing in PHP. > Naturally those two positions are mutually exclusive; if the compiler has > to > allow for "32" to be converted to 32 at runtime, it can't optimize the > opcodes by removing the code that would do that conversion! > > In essence, opt-in-strict becomes an opt-in "compiler, be pedantic so you > can > make my code faster" flag. More carrot than stick, since people can > control > when they opt-in to fancier compiler optimizations at the cost of some DX, > but only in some cases. I hope what I said above illustrates why it's a misperception - and I think it's a widely spread one. If your data source has the wrong type, and you still want to use it - you'd have to convert it. The cost would be similar whether it's done automatically by the language for you, or done manually through an explicit cast - the latter being significantly more likely to hide bugs. If people are in favor of strict typing because they think it can help generate faster code - they should understand it's a misperception and focus on the functionality instead! > I started this email planning to ask Anthony how flexible strict checking > could > get without losing the benefits of it, but I think I've just convinced > myself the > answer is "not very". Which then leaves only the question of internal > functions that Rasmus raised, which... it looks like is discussed in later > emails > so I will try to catch up on those. :-) I hope I can convince you back :) Given that are no substantial performance gains for strict typing vs. coercive typing, again, no performance gains from strict vs. coercive typing, we're really talking about functionality here. I actually think the strict camp has *a lot* to gain from the single, fairly strict but not as strict as zval.type comparison. Most notably - the vast majority of use cases that were brought up by strict typing proponents, such as rejecting lossy conversions ("100 dogs" -> 100, 37.7 -> 37, etc.) and rejecting 'inventive' conversions (like bool->anything) - will not only be supported, but they would be the *default*, and actually only available behavior. That is compared with the currently proposed RFC, where strict typing would have to be explicitly enabled. I also think that avoiding the proliferation of explicit casts - that is bound to happen by people adjusting their code to be strict compliant in a hurry - is a big gain for many strict typing proponents. It's true that there may certain use cases that coercive type hints may make more difficult - such as static analysis (I'm not entirely sure why that is, but I never dived into that) - but that in itself isn't a good enough reason, IMHO, to introduce a second, separate mode that deals with scalars in such a different way than the rest of PHP. Obviously, I think 'weak' campers have a lot to gain too - by making sensible conversions work fine as expected, without having to resort to explicit casts. And everyone stands to gain from having just one mode, instead of two. The coercive typing approach would require each camp to give up a bit of their 'ideology', but it also gives both schools of thought *most* of what they want, including the key tenets for each camp (rejecting non-sensible conversions - always, allowing sensible ones - always). I believe that's what makes it a good compromise, a better one than the currently proposed RFC. Thanks! Zeev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php