Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Anthony Ferrara Sat, 21 Feb 2015 10:13:27 -0800

Zeev,

First off, thanks for putting forward a proposal. I look forward to a
patch that can be experimented with.

There are a few concerns that I have about the proposal however:

> Proponents of Strict STH cite numerous advantages, primarily around code 
> safety/security. In their view, the conversion rules proposed by Dynamic STH 
> can easily allow ‘garbage’ input to be silently converted into arguments that 
> the callee will accept – but that may, in many cases, hide difficult-to-find 
> bugs or otherwise result in unexpected behavior.

I think that's partially mis-stating the concern. It's less about
"garbage input" and more about unpredictable behavior. You can't look
at code and know that it will not produce an error with dynamic
typing. That's one of the big advantages of strict typing that many
people want. In reality the reasons are complex, varied and important
to each person.

> Proponents of Dynamic STH bring up consistency with the rest of the language, 
> including some fundamental type-juggling aspects that have been key tenets of 
> PHP since its inception. Strict STH, in their view, is inconsistent with 
> these tenets.

Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages treatment of parameters.

However there's an important point to make here: a lot of best
practice has been pushing against the way PHP treats scalar types in
certain cases. Specifically around == vs === and using strict
comparison mode in in_array, etc.

So while it appears consistent with the rest of PHP, it only does so
if you ignore a large part of both the language and the way it's
commonly used.

In reality, the only thing PHP's type system is consistent at is being
inconsistent.

In the "Changes To Internal Functions" section, I think all three
types are significantly flawed:

1. "Just Do It" - This is problematic because a very large chunk of
code that worked in 5.x will all of a sudden not work in 7.0. This
will likely create a python 2/3 issue, as it would require a LOT of
code to be changed to make it compatible.

2. "Emit E_DEPRECATED" - This is problematic because raising errors
(even if suppressed) is not cheap. And the potential for raising one
for a non-trivial percentage of every native function call has the
potential to have a MASSIVE performance impact for code designed for
5.x. Without a patch to test, it can't really be codified, but it
would be a shame to lose the performance gains made with 7 because
we're triggering 100's, 1000's or 10000's of errors in a single
application run...

3. "Just Do It but give users an option to not" - This has the
problems that E_DEPRECATED has, but it also gets us back to having
fundamental code behavior controlled by an INI setting, which for a
very long time this community has generally seen as a bad thing
(especially for portability and code re-use).

Moving along,

> Further, the two sets can cause the same functions to behave differently 
> depending on where they're being called

I think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference is in your code, not the end function.

> For example, a “32” (string) value coming back from an integer column in a 
> database table, would not be accepted as valid input for a function expecting 
> an integer.

There's an important point to consider here. You're relying on
information outside of the program to determine program correctness.
So to say "coming back from an integer column" requires concrete
knowledge and information that you can't possibly have in the program.
What happens when some DBA changes the column type to a string type.
The data will still work for a while, but then suddenly break without
warning when a non-integer value comes in. Because the
value-information comes from outside.

With strict mode, you'd have to embed a cast (smart or explicit) to
convert to an integer at the point the data comes in. So semantic
information about the value is places right at the point of entry
(forcing the code to be more explicit and clear).

Additionally, with the dual-mode proposal DB interactions can be in
weak mode and have the exact behavior you're describing here. Giving
the user the choice, rather than making assumptions.

> Strict zval.type based STH effectively eliminates this behavior, moving the 
> burden of worrying about type conversion to the user.

Correct. And you say that as if it's a bad thing. Being explicit about
type conversions isn't what you'd do in a 10 line-of-code script where
you can realize what the types are by just thinking about it. But on
large scale systems exposing the type conversions to the user gives
the power to actually understand the codebase when you can't fit the
whole thing in your head at the same time.

So what you cite here as a disadvantage many consider to be an advantage.

> Performance

I find it funny how the non-strict crowd keeps bringing up performance...

> It is our position that there is no difference at all between strict and 
> coercive typing in terms of potential future AOT/JIT development - none at all

So really what you're saying is that you disagree with me publicly. A
statement which I said on the side, and I said should not impact RFC
or voting in any way. And is in no part in my RFC at all. Yet brought
up again.

> Static Analysis. It is the position of several Strict STH proponents that 
> Strict STH can help static analysis in certain cases. For the same reasons 
> mentioned above about JIT, we don't believe that is the case

This is patently false. Keep not believing it all you want, but
*static analysis* requires statically looking at code. Which means you
have no value information. So static analysis can't possibly happen in
cases where you need to know about value information (because it's not
there). Yes, at function entry you know the types. But static analysis
isn't about analyzing a single function (in fact, that's the least
interesting case). It's more about analyzing a series of functions, a
function call graph. And in that case strict typing (based only on
type) does make a big difference.

In short, I think the concerns around the handling of internal
functions is significant enough to cause major concern about this
proposal.

Thanks

Anthony

On Sat, Feb 21, 2015 at 12:22 PM, Zeev Suraski <z...@zend.com> wrote:
> All,
>
>
>
> I’ve been working with François and several other people from internals@
> and the PHP community to create a single-mode Scalar Type Hints proposal.
>
>
>
> I think it’s the RFC is a bit premature and could benefit from a bit more
> time, but given the time pressure, as well as the fact that a not fully
> compatible subset of that RFC was published and has people already
> discussing it, it made the most sense to publish it sooner rather than
> later.
>
>
>
> The RFC is available here:
>
>
>
> wiki.php.net/rfc/coercive_sth
>
>
>
> Comments welcome!
>
>
> Zeev

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Reply via email to