I'm writing this as an author and maintainer of a framework and many libraries.
Caveat, for those who aren't already aware: I work for Zend, and report to Zeev.
If you feel that will make my points impartial, please feel free to stop
reading, but I do think my points on STH bear some consideration.

I've been following the STH proposals off and on. I voted for Andrea's proposal,
and, behind the scenes, defended it to Zeev. On a lot of consideration, and as
primarily a _consumer_ and _user_ of the language, I'm no longer convinced that
a dual-mode proposal makes sense. I worry that it will lead to:

- A split within the PHP community, consisting of those that do not use
  typehints, those who do use typehints, and those who use strict.
- Poor programming practices and performance degradation by those who adopt
  strict, due to poor usage of type casting.

Let me explain.

The big problem currently is that the engine behavior around casting can lead to
data loss quickly. As has been demonstrated elsewhere:

    $value = (int) '100 dogs'; // 100  - non-numeric trailing values are trimmed
    $value = (int) 'dog100';   // 0    - non-numeric values leading
values -> 0 ...
    $value = (int) '-100';     // -100 - ... unless indicating sign.
    $value = (int) ' 100';     // 100  - space is trimmed; data loss!
    $value = (int) ' 100 ';    // 100  - space is trimmed; data loss!
    $value = (int) '100.0';    // 100  - probably correct, but loss of precision
    $value = (int) '100.7';    // 100  - precision and data loss!
    $value = (int) 100.7;      // 100  - precision and data loss!
    $value = (int) 0x1A;       // 26   - hex
    $value = (int) '0x1A';     // 0    - shouldn't this be 26? why is
this different?
    $value = (int) true;       // 1    - should this be cast?
    $value = (int) false;      // 0    - should this be cast?
    $value = (int) null;       // 0    - should this be cast?

Today, without scalar type hints, we end up writing code that has to first
validate that we have something we can use, and then cast it. This can often be
done with ext/filter, but it's horribly verbose:

    $value = filter_var(
        $value,
        FILTER_VALIDATE_INT,
        FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX
    );
    if (false === $value) {
        // throw an exception?
    }

Many people skip the validation step entirely for the more succinct:

    $value = (int) $value;

And this is where problems occur, because this is when data loss occurs.

What I've observed in my 15+ years of using PHP is that people _don't_ validate;
they either blindly accept data and assume it's of the correct type, or they
blindly cast it without validation because writing that validation code is
boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can
offload that to libraries, but why introduce a new dependency in something as
simple as a value object?

The promise of STH is that the values will be properly coerced, so that if I
write a function that expects an integer, but pass it something like '100' or
'0x1A', it will be cast for me — but something that is not an integer and cannot
be safely cast without data loss will be rejected, and an error can bubble up my
stack or into my logs.

Both the Dual-Mode and the new Coercive typehints RFCs provide this.

The Dual-Mode, however, can potentially take us back to the same code we have
today when strict mode is enabled.

Now, you may argue that you won't need to cast the value in the first place,
because STH! But what if the value you received is from a database? or from a
web request you've made? Chances are, the data is in a string, but the _value_
may be of another type. With weak/coercive mode, you just pass the data as-is,
but with strict enabled, your choices are to either cast blindly, or to do the
same validation/casting as before:

    $value = filter_var(
        $value,
        FILTER_VALIDATE_INT,
        FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX
    );
    if (false === $value) {
        // throw an exception?
    }

Interestingly, this adds overhead to your application (more function calls), and
makes it harder to read and to maintain. Ironically, I foresee "strict" as being
a new "badge of honor" for many in the language ("my code works under strict
mode!"), despite these factors.

If I don't enable strict mode on my code, and somebody else turns on strict when
calling my code, there's the possibility of new errors if I do not perform
validation or casting on such values. This means that the de facto standard will
likely be to code to strict (I can already envision the flood of PRs against OSS
projects for these issues).

You can say, "But, Static Analysis!" all you want, but that doesn't lead to me
writing less code to accomplish the same thing; it just gives me a tool to check
the correctness of my code. (Yes, this _is_ important. But we also have a ton of
tooling around those concerns already, even if they aren't proper static
analyzers.)

>From a developer experience factor, I find myself scratching my head: what are
we gaining with STH if we have a strict mode? I'm still writing exactly the same
code I am today to validate and/or cast my scalars before passing them to
functions and methods if I want to be strict.

The new coercive RFC offers much more promise to me as a consumer/user of the
language. The primary benefit I see is that it provides a path forward towards
better casting logic in the language, which will ensure that — in the future —
this:

    $value = (int) $value;

will operate properly, and raise errors when data loss may occur. It means that
immediately, if I start using STH, I can be assured that _if_ my code runs, I
have values of the correct type, as they've been coerced safely. The lack of a
strict mode means I can drop that defensive validation/casting code safely.

My point is: I'm sick of writing code like this:

    /**
     * @param int $code
     * @param string $reason
     */
    public function setStatus($code, $reason = null)
    {
        $code = filter_var(
            $value,
            FILTER_VALIDATE_INT,
            FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX
        );
        if (false === $code) {
            throw new InvalidArgumentException(
                'Code must be an integer'
            );
        }
        if (null !== $reason && ! is_string_$reason) {
            throw new InvalidArgumentException(
                'Reason must be null or a string'
            );
        }

        $this->code = $code;
        $this->reason = $reason;
    );

I want to be able to write this:

    public function setStatus(int $code, string $reason = null)
    {
        $this->code = $code;
        $this->reason = $reason;
    );

and _not_ push the burden on consumers to validate/cast their values.

This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the benefits of
strict mode, I'm concerned about the schism it may create in the PHP library
ecosystem, and that many of the benefits of the coercive portion of that RFC
will be lost when working with data from unknown data sources.

If you've read thus far, thank you for your consideration. I'll stop bugging you
now.

-- 
Matthew Weier O'Phinney
Principal Engineer
Project Lead, Zend Framework and Apigility
matt...@zend.com
http://framework.zend.com
http://apigility.org
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to