Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Dmitry Stogov Mon, 23 Feb 2015 03:04:22 -0800

On Feb 21, 2015 9:13 PM, "Anthony Ferrara" <[email protected]> wrote:
>
> Zeev,
>
> First off, thanks for putting forward a proposal. I look forward to a
> patch that can be experimented with.
>
> There are a few concerns that I have about the proposal however:
>
> > Proponents of Strict STH cite numerous advantages, primarily around
code safety/security. In their view, the conversion rules proposed by
Dynamic STH can easily allow ‘garbage’ input to be silently converted into
arguments that the callee will accept – but that may, in many cases, hide
difficult-to-find bugs or otherwise result in unexpected behavior.
>
> I think that's partially mis-stating the concern. It's less about
> "garbage input" and more about unpredictable behavior. You can't look
> at code and know that it will not produce an error with dynamic
> typing. That's one of the big advantages of strict typing that many
> people want. In reality the reasons are complex, varied and important
> to each person.
>
> > Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have been
key tenets of PHP since its inception. Strict STH, in their view, is
inconsistent with these tenets.
>
> Dynamic STH is apparently consistency with the rest of the language's
> treatment of scalar types. It's inconsistent with the rest of the
> languages treatment of parameters.
>
> However there's an important point to make here: a lot of best
> practice has been pushing against the way PHP treats scalar types in
> certain cases. Specifically around == vs === and using strict
> comparison mode in in_array, etc.
>
> So while it appears consistent with the rest of PHP, it only does so
> if you ignore a large part of both the language and the way it's
> commonly used.
>
> In reality, the only thing PHP's type system is consistent at is being
> inconsistent.
>
>
>
> In the "Changes To Internal Functions" section, I think all three
> types are significantly flawed:
>
> 1. "Just Do It" - This is problematic because a very large chunk of
> code that worked in 5.x will all of a sudden not work in 7.0. This
> will likely create a python 2/3 issue, as it would require a LOT of
> code to be changed to make it compatible.
>
> 2. "Emit E_DEPRECATED" - This is problematic because raising errors
> (even if suppressed) is not cheap. And the potential for raising one
> for a non-trivial percentage of every native function call has the
> potential to have a MASSIVE performance impact for code designed for
> 5.x. Without a patch to test, it can't really be codified, but it
> would be a shame to lose the performance gains made with 7 because
> we're triggering 100's, 1000's or 10000's of errors in a single
> application run...
>
> 3. "Just Do It but give users an option to not" - This has the
> problems that E_DEPRECATED has, but it also gets us back to having
> fundamental code behavior controlled by an INI setting, which for a
> very long time this community has generally seen as a bad thing
> (especially for portability and code re-use).
>
>
>
> Moving along,
>
> > Further, the two sets can cause the same functions to behave
differently depending on where they're being called
>
> I think that's misleading. The functions will always behave the same.
> The difference is how you get data into the function. The behavior
> difference is in your code, not the end function.
>
> > For example, a “32” (string) value coming back from an integer column
in a database table, would not be accepted as valid input for a function
expecting an integer.
>
> There's an important point to consider here. You're relying on
> information outside of the program to determine program correctness.
> So to say "coming back from an integer column" requires concrete
> knowledge and information that you can't possibly have in the program.
> What happens when some DBA changes the column type to a string type.
> The data will still work for a while, but then suddenly break without
> warning when a non-integer value comes in. Because the
> value-information comes from outside.
>
> With strict mode, you'd have to embed a cast (smart or explicit) to
> convert to an integer at the point the data comes in. So semantic
> information about the value is places right at the point of entry
> (forcing the code to be more explicit and clear).
>
> Additionally, with the dual-mode proposal DB interactions can be in
> weak mode and have the exact behavior you're describing here. Giving
> the user the choice, rather than making assumptions.
>
> > Strict zval.type based STH effectively eliminates this behavior, moving
the burden of worrying about type conversion to the user.
>
> Correct. And you say that as if it's a bad thing. Being explicit about
> type conversions isn't what you'd do in a 10 line-of-code script where
> you can realize what the types are by just thinking about it. But on
> large scale systems exposing the type conversions to the user gives
> the power to actually understand the codebase when you can't fit the
> whole thing in your head at the same time.
>
> So what you cite here as a disadvantage many consider to be an advantage.
>
> > Performance
>
> I find it funny how the non-strict crowd keeps bringing up performance...
>
> > It is our position that there is no difference at all between strict
and coercive typing in terms of potential future AOT/JIT development - none
at all
>
> So really what you're saying is that you disagree with me publicly. A
> statement which I said on the side, and I said should not impact RFC
> or voting in any way. And is in no part in my RFC at all. Yet brought
> up again.
>
> > Static Analysis. It is the position of several Strict STH proponents
that Strict STH can help static analysis in certain cases. For the same
reasons mentioned above about JIT, we don't believe that is the case
>
> This is patently false. Keep not believing it all you want, but
> *static analysis* requires statically looking at code. Which means you
> have no value information. So static analysis can't possibly happen in
> cases where you need to know about value information (because it's not
> there). Yes, at function entry you know the types. But static analysis
> isn't about analyzing a single function (in fact, that's the least
> interesting case). It's more about analyzing a series of functions, a
> function call graph. And in that case strict typing (based only on
> type) does make a big difference.


Strict and weak type hints provide exactly the same information for static
analizers - they guarantee the types of arguments at function entry. Having
this information it's possible to infer types of other variables inside the
function and even across the functions (analising call graph).
I don't see how the strict semantic of hints may change this guarantee in
some way.

Thanks. Dmitry.

>
>
>
> In short, I think the concerns around the handling of internal
> functions is significant enough to cause major concern about this
> proposal.
>
> Thanks
>
> Anthony
>
> On Sat, Feb 21, 2015 at 12:22 PM, Zeev Suraski <[email protected]> wrote:
> > All,
> >
> >
> >
> > I’ve been working with François and several other people from internals@
> > and the PHP community to create a single-mode Scalar Type Hints
proposal.
> >
> >
> >
> > I think it’s the RFC is a bit premature and could benefit from a bit
more
> > time, but given the time pressure, as well as the fact that a not fully
> > compatible subset of that RFC was published and has people already
> > discussing it, it made the most sense to publish it sooner rather than
> > later.
> >
> >
> >
> > The RFC is available here:
> >
> >
> >
> > wiki.php.net/rfc/coercive_sth
> >
> >
> >
> > Comments welcome!
> >
> >
> > Zeev
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>

Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Reply via email to