RE: [PHP-DEV] Coercive Scalar Type Hints RFC

Zeev Suraski Sat, 21 Feb 2015 11:12:06 -0800

Sorry for the previous prematurely sent email, looks like I found a new
keyboard shortcut :)


> -----Original Message-----
> From: Anthony Ferrara [mailto:ircmax...@gmail.com]
> Sent: Saturday, February 21, 2015 8:12 PM
> To: Zeev Suraski
> Cc: PHP internals
> Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC
>
> Zeev,
>
> First off, thanks for putting forward a proposal. I look forward to a
> patch
> that can be experimented with.
>
> There are a few concerns that I have about the proposal however:
>
> > Proponents of Strict STH cite numerous advantages, primarily around code
> safety/security. In their view, the conversion rules proposed by Dynamic
> STH
> can easily allow ‘garbage’ input to be silently converted into arguments
> that
> the callee will accept – but that may, in many cases, hide
> difficult-to-find
> bugs or otherwise result in unexpected behavior.
>
> I think that's partially mis-stating the concern.

I don't think it's mis-stating the key concern.  At least not based on what
I've heard from most people here over the last few months.

> It's less about "garbage input"
> and more about unpredictable behavior. You can't look at code and know
> that it will not produce an error with dynamic typing. That's one of the
> big
> advantages of strict typing that many people want. In reality the reasons
> are
> complex, varied and important to each person.

Your ability to look at code and know whether or not it will produce errors
is very similar in both strict and coercive typing.  But that goes back to
what we already decided to agree to disagree on - whether or not strict type
give you any tangible extra data when you look at code - aka Static
Analysis.
Note that Strict Typing would produce all the errors of coercive typing and
then some.  So knowing whether code will produce errors is arguably more
difficult in strict typing, although I think that at the end of the day,
it's pretty much equivalent.
Again, I did see Static Analysis being brought up by just a handful of
people, perhaps not even that.  For most people, it was the silent
acceptance of input that's likely to be invalid.

> > Proponents of Dynamic STH bring up consistency with the rest of the
> language, including some fundamental type-juggling aspects that have been
> key tenets of PHP since its inception. Strict STH, in their view, is
> inconsistent
> with these tenets.
>
> Dynamic STH is apparently consistency with the rest of the language's
> treatment of scalar types. It's inconsistent with the rest of the
> languages
> treatment of parameters.

Not in the way Andrea proposed it, IIRC.  She opted to go for consistency
with internal functions.  Either way, at the risk of being shot for talking
about spiritual things, Dynamic STH is consistent with the dynamic spirit of
PHP, even if there are some discrepancies between its rule-set and the
implicit typing rules that govern expressions.  Note that in this RFC I'm
actually suggesting a possible way forward that will align *all* aspects of
PHP, including implicit casting - and have them all governed by a single set
of rules.

> However there's an important point to make here: a lot of best practice
> has
> been pushing against the way PHP treats scalar types in certain cases.
> Specifically around == vs === and using strict comparison mode in
> in_array,
> etc.

I think you're correct on comparisons, but not so much on the rest.  Dynamic
use of scalars in expressions is still exceptionally common in PHP code.
Even with comparisons, == is still very common - and you'd use == vs. ===
depending on what you need.

> So while it appears consistent with the rest of PHP, it only does so if
> you
> ignore a large part of both the language and the way it's commonly used.

Let's agree to disagree.  That's one thing we can always agree on!  :)

> In reality, the only thing PHP's type system is consistent at is being
> inconsistent.

I'd have to partially agree with you here;  But if you read the RFC through
including its future recommendations, you'd see it's perhaps the first
attempt in 20 years to fix that.  Instead of doing that through the
introduction of a 3rd (albeit simplistic rule-set that only pays attention
to zval.type) - a creation of a single set of rules that will be consistent
across the whole language, beginning with userland and internal functions.

> In the "Changes To Internal Functions" section, I think all three types
> are
> significantly flawed:
>
> 1. "Just Do It" - This is problematic because a very large chunk of code
> that
> worked in 5.x will all of a sudden not work in 7.0. This will likely
> create a
> python 2/3 issue, as it would require a LOT of code to be changed to make
> it
> compatible.
>
> 2. "Emit E_DEPRECATED" - This is problematic because raising errors (even
> if
> suppressed) is not cheap. And the potential for raising one for a
> non-trivial
> percentage of every native function call has the potential to have a
> MASSIVE
> performance impact for code designed for 5.x. Without a patch to test, it
> can't really be codified, but it would be a shame to lose the performance
> gains made with 7 because we're triggering 100's, 1000's or 10000's of
> errors
> in a single application run...
>
> 3. "Just Do It but give users an option to not" - This has the problems
> that
> E_DEPRECATED has, but it also gets us back to having fundamental code
> behavior controlled by an INI setting, which for a very long time this
> community has generally seen as a bad thing (especially for portability
> and
> code re-use).

I do too, and I was upfront about their cons, not just pros.  And yet, they
all bring us to a much better outcome within a relatively short period of
time (in the lifetime of a language) than the Dual Mode will.

> > Further, the two sets can cause the same functions to behave
> > differently depending on where they're being called
>
> I think that's misleading. The functions will always behave the same.
> The difference is how you get data into the function. The behavior
> difference
> is in your code, not the end function.

I'll be happy to get a suggestion from you on how to reword that.
Ultimately, from the layman user's point of view, she'd be calling foo()
from one place and have it accept her arguments, and foo() from another
place and have it reject the very same arguments.

> > For example, a “32” (string) value coming back from an integer column in
> > a
> database table, would not be accepted as valid input for a function
> expecting
> an integer.
>
> There's an important point to consider here. You're relying on information
> outside of the program to determine program correctness.
> So to say "coming back from an integer column" requires concrete
> knowledge and information that you can't possibly have in the program.
> What happens when some DBA changes the column type to a string type.
> The data will still work for a while, but then suddenly break without
> warning
> when a non-integer value comes in. Because the value-information comes
> from outside.

Of course we're relying on information coming from outside, as we all know,
this is one of the most common use cases for PHP.
While theoretically you're right, in practice, in the vast majority of cases
it wouldn't play out like that.  The string column won't be tested
exclusively with "123" inputs.  As soon as there's a non-numeric-string
input, it'll fail.  That's likely to happen very early in the process, and
that's before considering that if there's such a huge mismatch between the
semantic meaning of the column and what the function expects - the problem
is likely to be found even sooner, since the function will simply not
perform its intended job.

On the flip-side, imagine that same developer using strict types.  Feeding
the function that integer in string form gets rejected.  What are her
options?  The developer is likely  to just explicitly cast the value into an
int, giving up on any and all sanitization that coercive types would offer
her, happily accepting "Apples" and "100 Dalmatians" as valid inputs.  That,
on the other hand, is a *very* likely scenario.

> With strict mode, you'd have to embed a cast (smart or explicit) to
> convert to
> an integer at the point the data comes in.

First, I'm not aware of smart/safe casts being available or proposed at this
point.
Secondly, why at the point the data comes in?  That would be ideal for
static analyzers, but it's probably a lot more common that it will be done
at the first point in time where it gets rejected.

> Additionally, with the dual-mode proposal DB interactions can be in weak
> mode and have the exact behavior you're describing here. Giving the user
> the
> choice, rather than making assumptions.

This is bound to be misquoted and used against me, but I don't think it's a
good idea to give the user the choice in such a way.  I could have sworn
that you tweeted the quote about perfection being not when there's nothing
left to add, but nothing left to remove, but perhaps it was someone else.
Either way, two modes are worse than one, if we can come up with a good
single unified mode that addresses *most* cases.

Remember you can always implement custom type checking to your heart's
content.  You can easily implement if (!is_int($foo)) { exit; } in the
not-so-common-cases where accepting "42" as 42 might be disastrous.
However, on the caller side, forcing people to clutter their code with
casts - many casts - either explicit casts or custom ones - is going to
affect a lot more developers in a lot more places.  The bang for the buck of
adding strict mode is just not there, in my humble opinion of course.


> > Strict zval.type based STH effectively eliminates this behavior, moving
> > the
> burden of worrying about type conversion to the user.
>
> Correct. And you say that as if it's a bad thing. Being explicit about
> type
> conversions isn't what you'd do in a 10 line-of-code script where you can
> realize what the types are by just thinking about it. But on large scale
> systems
> exposing the type conversions to the user gives the power to actually
> understand the codebase when you can't fit the whole thing in your head at
> the same time.

I have a hard time connecting to the 'power' approach.  I think developers
want their code to work, with minimal effort, and be secure.  Coercive
scalar type hints will do an excellent job at that.  Strict type hints will
be more work, are bound to a lot of trigger "Oh come on" responses, and as a
special bonus - proliferate the use of explicit casts.  Let me top that -
you'd have developers who think they're security conscious, because they're
using strict mode - with code that's full of explicit casts.


> So what you cite here as a disadvantage many consider to be an advantage.

Perhaps, but I used the proper verb at the top ("We believe").

> > It is our position that there is no difference at all between strict
> > and coercive typing in terms of potential future AOT/JIT development -
> > none at all
>
> So really what you're saying is that you disagree with me publicly. A
> statement which I said on the side, and I said should not impact RFC or
> voting
> in any way. And is in no part in my RFC at all. Yet brought up again.

We listed all what we believe to be misconceptions that were brought up on
internals.  As recently as yesterday, you had a PHP power user (Larry) that
was under the strong impression Strict STH would yield substantial
performance benefits.  Given that it was claimed in the past, and since we
can't assume every voter reads every last word that's written on internals@
threads, it was important to list that here even if it's not mentioned in
the Strict/Dual mode RFC.
It's also worth mentioning that there are people who *assume* that strict
type hints can somehow help performance, without being domain experts at
neither the engine nor JIT, even if they weren't exposed to the explicit
statements that suggested that on blogs and on internals@ - adding to the
importance of making it clear that there are no performance benefits to that
approach.

>
> > Static Analysis. It is the position of several Strict STH proponents
> > that Strict STH can help static analysis in certain cases. For the
> > same reasons mentioned above about JIT, we don't believe that is the
> > case
>
> This is patently false.

It's actually patently true.  We don't believe that is the case.  QED.

While at it, can we stop using that 'patently false', and stick for
constructive wording such as 'I disagree'?

Also, I think that if you quoted the rest of the sentence you chose to trim,
it would appear a lot less confrontational:
"Static Analysis. It is the position of several Strict STH proponents that
Strict STH can help static analysis in certain cases. For the same reasons
mentioned above about JIT, we don't believe that is the case - ***although
it's possible that Strict Typing may be able to help static analysis in
certain edge cases.***"

That's still under 'we (don't) believe', so again, it's "patently true".
You can disagree, but that's our opinion.

I'll also add the most important part of that paragraph for the sake of
completeness:
"It is our belief that even if that is true, Static Analyzers need to be
designed for Languages, rather than Languages being designed for Static
Analyzers."

> Keep not believing it all you want, but *static analysis*
> requires statically looking at code. Which means you have no value
> information. So static analysis can't possibly happen in cases where you
> need
> to know about value information (because it's not there). Yes, at function
> entry you know the types. But static analysis isn't about analyzing a
> single
> function (in fact, that's the least interesting case). It's more about
> analyzing a
> series of functions, a function call graph. And in that case strict typing
> (based
> only on
> type) does make a big difference.

I think it's fair to say that while we were unable to convince you there's
no tangible extra value in Strict STH compared to any other kind of STH that
guarantees the type of value a function will get, you were also unable to
convince Dmitry, Stas or myself - all of which independently discussed it
with you.  Again, despite that, I'm not saying that you're "patently wrong",
just that I don't believe you're right.

Thanks for the feedback!

Zeev

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP-DEV] Coercive Scalar Type Hints RFC

Reply via email to