Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)

Anthony Ferrara Mon, 23 Feb 2015 06:10:15 -0800

Zeev,


>> You're still doubling the number of CPU ops and adding at least one branch
>> at
>> runtime, but not a massive difference.
>
> To be honest, I missed an important part in the semantics of the sample
> code, the fact that the result of the division in bar() is sent to function
> with an integer type hint, which means it may with Coercive STH just as it
> would with Strict STH (in retrospect I now understand you alluded to that in
> your replies to Stas, but was too tired to realize that).  That means that
> we could conduct identical static analysis, alerting the developer to the
> exact same possible type mismatch in both Coercive STH and Strict STH.  I
> actually fail now to see how the process would be any different at all
> between the two modes.  This particular code requires changes in order to
> work in all cases - semantic changes - probably either explicit casting to
> int or - more likely - changing the type hints to float.  In either of these
> cases, we'd now have fully known types for the entire flow and could
> optimize it to machine code equally well and equally easily, and with the
> same number of resulting CPU ops.

With coercive types the analyzer/compiler would be forced to give up
on 100% valid code (not even "bad practice", but completely valid).
With strict types, the few cases it would need to give up will be
overly dynamic and hence risky with strict types.

So while, yes, you can compile **a subset** of coercive types using
the methods discussed here, you can compile a far greater percentage
of valid strict code. As in the opposite relationship (most valid
coercive code won't be compilable, where most valid strict code will).

And that's where the huge difference is.

>> However in general you'd have to use something like div_function and use a
>> union type of some sort. You mention this (about checking zval.type at
>> runtime). My goal would be to avoid using unions at all (and hence no
>> zval).
>> Because that drastically simplifies both compiler and code generator
>> design.
>
> That would be our goal as well.  And in most cases, the success ratio will
> be the same between coercive and strict implementations.  In that snippet

I don't think so. The cases where it would be the same coercive gives
literally no benefit over strict types. So at that point why not go
all the way and use strict types in the first place?

The cases where it makes a difference (a lot more than I think you're
counting) are going to be where 100% valid coercive code is too
dynamic to reason about (given our simple example above required
modification).

> (which again, I misanalysed) - a static analyzer will be able to alert to
> the same issue, prompting the developer to fix it (equally in both Coercive
> and Strict).  With the two probable fixes - an explicit cast to int, or the
> more likely change of type hints across the sample from int to float - we'd
> be able to conduct the same compile-time analysis and generate optimal,
> ZVAL-free code in both Coercive and Strict STH.

Again, in trivial cases with 2-3 operations and simple types, yes. In
non-trivial cases, or in cases that you explicitly support that's
going to be a lot harder if not impossible:

function foo(string $bar): int {
    return $bar;
}

In strict types, that's **always** an error. In coercive types, it
depends on what you pass in. If you pass "30", then everyone's happy.
If then, down the road, "30 dogs" gets passed in, boom.

If we want to say that the static analyzer can complain about 100%
valid (and encouraged, if we buy the statements you make in your RFC)
code, then great. But then that means we're no longer analyzing PHP,
but a subset of it.

>> It's very much not about impossible.
>
> I'm happy we have that clearly stated, as based on emails here and
> elsewhere, it wasn't clear to a lot of people beforehand.  Given identical
> input source code, in all the cases where it's possible to generate C-level
> calls with Strict, it will also be possible to generate C-level calls with
> Coercive.

It's possible today without types, see Recki-CT. However it's only
possible with a strict subset of coercive code (perhaps 5% of possible
valid code).

With strict types, it's possible on perhaps 95% of valid code.

>> It's about complexity.  Strict code is
>> easier to reason about, it's easier to analyze and it's easier to
>> code-generate
>> because all of the reduced amount that you need to support. And we're not
>> talking about making users change their code drastically. We're talking
>> about
>> -in many cases- minor tweaks.
>
> The explicit casting risk has been beaten to death so I won't dive into it
> yet again.
>
> I think it boils down to strict STH accepting fewer inputs, which a static
> analyzer can sometimes pick up, thereby prompting developers to be more
> explicit in their choices of types - in turn providing more compile-time
> type information and more restrictive at that - thereby making the code
> easier for AOT to work on.  It still holds that given the same
> explicitly-typed code, AOT/JIT can do an identical job between Strict STH
> and Coercive STH.  The difference is that the Static Analyzer would be able
> to alert you to some more rejections - because there would be more
> rejections with Strict than there would be with Coercive - rejections that
> would prompt you to change your code.  I still think that the cases where a
> static analyzer can provide more insight in Strict vs. Coercive are
> relatively rare.  Our ability to infer the type in compile time is identical
> between both;  In cases where we can clearly know with absolute confidence
> that the type we have is the type the function wants - we can optimize that
> into a C call - in both cases.  If we can't infer the type in compile-time,
> then the code we'd generate would be the same in both cases.  The main
> difference is in cases where we can infer possible types, which would be
> rejected in Strict but accepted in Coercive.  With Strict, you could simply
> warn about it, pushing the developer in the direction of changing his code
> (potentially making it .  With Coercive, you could generate optimized code
> for the likely case, and catch-all code for the less likely cases.

I think you're being generous here in terms of how much code is going
to be analyzable. That's one of the points of strict typing, that if
it's not analyzable it's not valid code.

>> Minor tweaks that would need to be done with your proposal as well. So if
>> we're going to require users change their code, why not make it opt-in and
>> give them the predictability that we can?
>
> That's off topic for the JIT discussion.  I explained why I think having two
> modes would be have negative implications in the RFC.

How is it off topic? I think it's incredibly important. Because you're
claiming that you can do the exact same thing with coercive as we can
do with strict. But that's only true if you change code in coercive
mode. Only if you use a subset of valid coercive code. But with strict
you get that for free. So if users are going to be having to modify
their code anyway, what benefit does coercive give them over strict?
They are going to opt-in anyway in either case.

>> > Let me describe here too how it may look with coercive hints.  Instead
>> > of beginning with the assertion that it must be an int, we make no
>> > guess as to what it may be(*).  We would use the very same methods you
>> > would use to prove or refute that it's an int, to determine whether
>> > it's an int.  Our ability to deduce that it's an int is going to be
>> > identical to your ability to prove that it's an int.  If we see that
>> > it comes from an int type hint, from an int typed function, etc. -
>> > we'd be able to generate the same ultra optimized C-level call.  If we
>> > manage to deduce that it may be an int or a float, we can still create
>> > an ultra-optimized calling code that would deal with just these two
>> > cases, or call coerce_to_int().  If we deduce that it's a type that
>> > cannot be
>> converted to an int (e.g. array or resource) - we can
>> > emit a compile-time error.   And if we have no idea what it is, we emit
>> > a
>> > regular function call.  Going back to that (*) from earlier, even if
>> > we're unable to deduce what it is, we can actually assume/hope that
>> > it'll be an integer and if it is - pass it on directly to the C
>> > implementation with a C level function call;  And if not, go with the
>> > regular
>> function call.
>> >
>> > The machine code you're left with is pretty much equivalent in case we
>> > reached the conclusion that the variable is an integer (which would be
>> > roughly in the same cases you're able to prove it that it is).  The
>> > difference would be that it allows for the non-integer types to be
>> > accepted according to the coercion rules, which is a functional
>> > difference, not performance difference.
>>
>> Well, the end result is pretty much equivalent. But only pretty much.
>> In the example above, the few CPU ops and extra branch will very likely
>> slow
>> down the code significantly (more than a factor of 2).
>
> If we manage to conclude that the value is an integer - the code would not
> only be pretty much identical, but completely identical.
> If we manage to conclude that the value is either an integer or a float
> (which I don't believe is a very common scenario, pretty unique to the
> division operator) - then in both cases a static analyzer can alert the
> developer his code is potentially unsafe given certain inputs.  If the
> developer decides to change his code to an explicit cast - we're back to the
> first scenario.  If not - then the generated code would still be similar
> between Strict and Coercive, with the difference being Strict flat out
> rejecting floats, while Coercive performing some check on them to see if
> they can be converted with no data loss.  In both Strict and Coercive, the
> ZVAL structure will have to stick around, you'd have to check the type and
> perform different actions depending on it.  It's true that if the value is
> the result of the division operator, then pretty much by definition a float
> fed to the hint would always fail to convert to an int without data loss,
> but that's really, really a very specialized property of the division
> operator.

I'm talking about generic cases, and you're talking about special
cases. And in the generic cases you can't conclude the value is an
integer in coercive mode. But you can in strict mode.

Yes, for a very small subset of code strict provides no benefit over
coercive in terms of static analysis ability. But it's only for that
subset. The rest of the code strict provides significant benefits.

> It boils down to what semantics the developer is after, not whether Strict
> can generate more efficient code.  In both cases, the static analyzer can
> alert us to the same issue;  Whether we can emit more efficient int-only
> code would depend on whether the developer changes his code so that the
> input can clearly be inferred as an int during compile-time.
>
>> Again, not saying this is major enough to be concerned about, but it's not
>> identical. There are small differences.
>
> I agree, but they stem from the difference in functionality, not because we
> can optimize Strict code better.  Given identical input source code, Strict
> and Coercive can be optimized to exactly the same code.

Yes, identical input. But we're not talking about identical input.
We're talking the general case.

And if you really believe that the general case you can analyze
coercive code better than strict code, there's no real point
continuing this discussion as there's no basis in reality.

Anthony

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)

Reply via email to