RE: [PHP-DEV] Reviving scalar type hints

Zeev Suraski Thu, 19 Feb 2015 01:14:21 -0800

> -----Original Message-----
> From: Larry Garfield [mailto:la...@garfieldtech.com]
> Sent: Thursday, February 19, 2015 9:00 AM
> To: internals@lists.php.net
> Subject: Re: [PHP-DEV] Reviving scalar type hints
>
> On 02/17/2015 01:30 PM, Zeev Suraski wrote:
> >> Yes, I already know that.
> At this point, if I could rephrase the "camps" a bit I see two different
> sets of
> priorities:
>
> 1) PHP should do what seems "obviously safe" to do, to make life easiest
> for
> developers.  That is, it's patently obvious that "32" and 32 are
> equivalent, so
> don't make developers worry about the distinction because to them there
> isn't one.  This is an entirely reasonable position.
>
> 2) PHP would benefit hugely from static analysis tools and compile-time
> type-based optimizations, but those are only possible with code that is
> strongly typed.  Currently such tools do not really exist, but with
> compile-
> time-knowlable information could be written and even incorporated into
> future versions of PHP without API breaks.  (I think Anthony demonstrated
> earlier examples of function calls no longer being slow, for instance, if
> the
> type juggling could be removed at compile
> time.)  This is an entirely reasonable position.


Larry,

There's actually very little difference between coercive type hinting and
strict type hinting in terms of performance.  If you read what both Dmitry
and Anthony said, it should be clear that the vast majority of gains can be
had even without any sort of type hinting at all - and as Stas pointed out,
JavaScript has some mind blowing JIT optimizations without any explicit type
info at all.

Moreover, I think it's easy to lose the forest from the trees here, by
focusing on a very narrow piece of code - without looking at the bigger
picture.

Ultimately, if you have a piece of data that you want to pass from a caller
to a callee, it could be under one of three labels:
1.  A piece of data the callee can use as-is.
2.  A piece of data the callee can use after conversion (be it explicit or
implicit).
3.  A piece of data the callee cannot/shouldn't use.

When comparing strict and coercive type hints, there's no difference between
them in terms of #1;  There's a subtle difference with #3 - but only in the
error situation.  In other words, for coercive type hints, it would just
take a bit more time before they fail, because they have to conduct a few
more checks.  However, that's an error situation anyway, which is either
already going to bail out, or go through error handling code - which would
be very slow anyway.

So focusing on #2, in a practical real world situation - the difference is
actually a lot more subtle than people might think if they only zoom into on
the area around parameter passing.  The bigger picture is, what would the
code author - the one making the call - want to do, semantically?  In other
words, if you have "32" coming from a database or whatnot, are you likely to
want an API that accepts an int to be able to use that?  I think the answer
is almost always yes.  So practically, what will happen with strict typing
is that you'd explicitly cast it to int, while with coercive typing - you'd
rely on the language to do it for you.  Arguably, very little difference
between the two in terms of performance.  Note that it's possible people
will be able to come up with various edge cases where strict typing might
somehow alert you to a situation that may push you to change your code in a
way it might end up being slightly faster.  But those will be edge cases and
should be taken in the context - in the vast majority of code patterns,
there's zero difference between the two approaches in terms of performance.

In terms of functionality, however, there's actually a substantial
difference between the two - explicit casting is a lot more aggressive than
the coercion rules we're thinking about for coercive type hints.  It'll
happily and silently coerce "Apple" into 0, "100 dogs" into 100, and 3.1415
into 3.

Now, diving back to future potential AOT/JIT, it's simply not true that
there's any gain at all from strict typing - or at least, neither Dmitry
(who wrote a full JIT compiler for PHP that runs Mandelbrot as fast as gcc
does) nor me were able to understand them.  Anthony spoke about being able
to completely eliminate the zval container and all associated checks, so
that in certain situations you'd be able to map a PHP integer all the way
down to a C (or asm) integer.  That can certainly be done, but it has
nothing to do with strict vs. coercive type hints.  Here's why:

1. At this point I think it's clear to everyone that inside the called
function, there's zero difference between strict and coercive typing (or
even the weak typing we were talking about earlier).  They're 100%
guaranteed to receive what they asked, either because values were coerced or
blocked from even making it into the function.
2. On the outside calling code - if you can conduct the level of type
inference that would enable you to safely compile a PHP integer into a
machine code integer, by all means - do it;   While at it, generate slightly
different function calling code that would bypass zval type checks
altogether, and provide that function with the integer it wanted.

Note that in his JIT POC, Dmitry managed to conduct a lot of this without
any type hinting *at all*, so while type hints (be them
strict/coercive/weak) make this job a bit easier - they're hardly required;
Nor do they solve the bigger challenging problem - which is type inference
in the various functions' code bodies themselves - since we don't have
variable declarations or strong typing in PHP.

> Naturally those two positions are mutually exclusive; if the compiler has
> to
> allow for "32" to be converted to 32 at runtime, it can't optimize the
> opcodes by removing the code that would do that conversion!
>
> In essence, opt-in-strict becomes an opt-in "compiler, be pedantic so you
> can
> make my code faster" flag.  More carrot than stick, since people can
> control
> when they opt-in to fancier compiler optimizations at the cost of some DX,
> but only in some cases.

I hope what I said above illustrates why it's a misperception - and I think
it's a widely spread one.  If your data source has the wrong type, and you
still want to use it - you'd have to convert it.  The cost would be similar
whether it's done automatically by the language for you, or done manually
through an explicit cast - the latter being significantly more likely to
hide bugs.  If people are in favor of strict typing because they think it
can help generate faster code - they should understand it's a misperception
and focus on the functionality instead!

> I started this email planning to ask Anthony how flexible strict checking
> could
> get without losing the benefits of it, but I think I've just convinced
> myself the
> answer is "not very".  Which then leaves only the question of internal
> functions that Rasmus raised, which... it looks like is discussed in later
> emails
> so I will try to catch up on those. :-)

I hope I can convince you back :)
Given that are no substantial performance gains for strict typing vs.
coercive typing, again, no performance gains from strict vs. coercive
typing, we're really talking about functionality here.

I actually think the strict camp has *a lot* to gain from the single, fairly
strict but not as strict as zval.type comparison.  Most notably - the vast
majority of use cases that were brought up by strict typing proponents, such
as rejecting lossy conversions ("100 dogs" -> 100, 37.7 -> 37, etc.) and
rejecting 'inventive' conversions (like bool->anything) - will not only be
supported, but they would be the *default*, and actually only available
behavior.  That is compared with the currently proposed RFC, where strict
typing would have to be explicitly enabled.  I also think that avoiding the
proliferation of explicit casts - that is bound to happen by people
adjusting their code to be strict compliant in a hurry - is a big gain for
many strict typing proponents.

It's true that there may certain use cases that coercive type hints may make
more difficult - such as static analysis (I'm not entirely sure why that is,
but I never dived into that) - but that in itself isn't a good enough
reason, IMHO, to introduce a second, separate mode that deals with scalars
in such a different way than the rest of PHP.

Obviously, I think 'weak' campers have a lot to gain too - by making
sensible conversions work fine as expected, without having to resort to
explicit casts.
And everyone stands to gain from having just one mode, instead of two.
The coercive typing approach would require each camp to give up a bit of
their 'ideology', but it also gives both schools of thought *most* of what
they want, including the key tenets for each camp (rejecting non-sensible
conversions - always, allowing sensible ones - always).  I believe that's
what makes it a good compromise, a better one than the currently proposed
RFC.

Thanks!

Zeev

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP-DEV] Reviving scalar type hints

Reply via email to