On Sun, 2 Jan 2022 at 06:20, Michael Morris <tendo...@gmail.com> wrote:
>
> On Sat, Jan 1, 2022 at 10:47 PM Kirill Nesmeyanov <n...@xakep.ru> wrote:
>
> >
> > >Суббота, 1 января 2022, 17:41 +03:00 от Rowan Tommins <
> > rowan.coll...@gmail.com>:
> > >
> > >On 31/12/2021 00:21, Kirill Nesmeyanov wrote:
> > >> I support this behavior fix because in its current form, due to a
> > similar problem (almost?), all PSR-7 implementations contain bugs that
> > violate RFC7230 (section 3.2:
> > https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus,
> > physically, by the standard, all headers can have the name "0" (like «0:
> > value»), but when stored inside implementations, it is converted to a
> > string and a problem arises ($message->getHeaders() //
> > returns array<int|string, string> instead of array<string, string>).


The solution is to cast the keys back to string when reading from the
array, IF the type matters.

foreach ($headers as $k => $values) {
  $name = (string) $k;
}

We could introduce an alternative to array_keys() that would do this
automatically, e.g. "array_keys_str()".


> > >
> > >You appear to be technically correct - the RFC defines a header name
> > >only as "token", which implies the following would all be valid HTTP
> > >headers:
> > >
> > >42: The Answer
> > >!: Bang
> > >^_^: Surprised
> > >
> > >In practice, it would be a bad idea to use any of these.
> > >
> > >Every single one of the field names registered with IANA [1] starts with
> > >a letter, and proceeds with only letters, digits, and hyphen ('-'). [The
> > >exception is "*", listed there as "reserved" to specifically prevent its
> > >use conflicting with the wild-card value in "Vary" lists.]
> > >
> > >I'm actually surprised this definition hasn't been updated with
> > >interoperability advice in recent revisions of the standard. I did find
> > >this general advice for internet message headers in RFC 3864 [2]:
> > >
> > > > Thus, for maximum flexibility, header field names SHOULD further be
> > > >  restricted to just letters, digits, hyphen ('-') and underscore ('_')
> > > >  characters, with the first character being a letter or underscore.
> > >
> > >The additional restriction on underscore ('_') in HTTP arises from CGI,
> > >which maps headers to environment variables. For instance, Apache httpd
> > >silently drops headers with anything other than letters, digits, and
> > >hyphen [3] to avoid security issues caused by environment manipulation.
> > >
> > >If I was developing a PSR-7 or similar library, I would be inclined to
> > >drop any header composed only of digits, and issue a diagnostic warning,
> > >so that it wouldn't escalate to a type error later. It certainly doesn't
> > >seem reasonable to change the entire language to work around that
> > >inconvenience.
> > >
> > >[1]  https://www.iana.org/assignments/http-fields/http-fields.xhtml
> > >[2]  https://datatracker.ietf.org/doc/html/rfc3864#section-4.1
> > >[3]  https://httpd.apache.org/docs/trunk/env.html#setting
> > >
> > >Regards,
> > >
> > >--
> > >Rowan Tommins
> > >[IMSoP]
> > >
> > >--
> > >PHP Internals - PHP Runtime Development Mailing List
> > >To unsubscribe, visit:  https://www.php.net/unsub.php
> >
> > I just gave an example of what at the moment can cause an exception in any
> > application that is based on the PSR. It is enough to send the header "0:
> > Farewell to the server". In some cases (for example, as is the case with
> > RoadRunner) - this can cause a physical stop and restart of the server.
> >
> > Just in case, I will repeat my thesis: I cannot imagine that anyone is
> > using this functionality consciously and that it is part of the real logic
> > of the application.

It is not really relevant weather this is used _consciously_.

>
>
> You don't have a lot of experience with legacy code then. PHP, particularly
> old PHP (like 4, 5.1 era) was used by a lot of idiots.
>
> I was one of those idiots (Perhaps I still am an idiot - jury is
> deliberating on that but I digress).

We don't need to assume incompetence.

Any code that deals with arrays _must_ consider this behavior, unless
the array keys are known to be only integers, or only non-integer-like
strings.

One obvious BC break:

What would be the value of $a in the following snippet?

$a = [];
$a['5] = 's';
$a[5] = 'n';
$a[7] = 'n';
$a['7'] = 's';

Currently it would be [5 => 'n', 7 => 's'].
With the "new" behavior, we'd have to decide what happens.
Can keys '5' and 5 coexist? ['5' => 's', 5 => 'n', 7 => 'n', '7' => 's']?
Or would assignment change the key type? [5 => 'n', '7' => 's']?
Or does the initial key type remain, and only the value changes? ['5'
=> 'n', 7 => 's']?

I would argue that the current behavior might still be the best we can
get for a general-purpose structure that can act as a vector or a map
or a mix of both.
The perceived awkwardness is just a result of trying to do everything at once.

Possible solutions:
- Dedicated array-reading methods that cast all keys to string on read.
- New structures, alternative to array, that either allow separate
entries for 5 and '5', or that are restricted to one key type.


--- Andreas

>
> Snark aside though, PHP has more than its fair share of self taught
> programmers (again, not trying to be insulting as I am one myself), and
> they do things with the code that veterans and formally trained programmers
> would never think to try, let alone implement.
>
> I guarantee fixing how key handling is done will break something - either
> in the form of code exploiting the weird behavior, or code that is guarding
> against the weird behavior; not to mention any tests that might be written
> - though amateurs rarely write test code (again, speaking from past
> experience I've grown beyond).
>
>
>
> > And fixing this behavior, I believe, will automatically fix many libraries
> > (not necessarily PSR) that do not take this behavior into account.
> >
> >
>
> And blow up who knows how many old code bases - many of which don't have
> unit test suites to discover if there is a break ahead of time.  This is
> the sort of BC break that would cause a cliff of users unable to migrate to
> the major version that implements it.  A Python 2 vs. 3 style of break.
>
> Even with that all said it may indeed be worth fixing - but this will
> require the same sort of kid gloves approach removing register globals had
> (for the newer folks, there was a time when $_REQUEST["var"] would auto
> populate $var with lovely security snarls).  IIRC PHP 3 had register
> globals always on, 4 created a config toggle to turn them off, and PHP 5.0
> turned that toggle off by default, finally PHP 5.3 (6 without unicode more
> or less) removed support for register globals entirely (My memory could be
> off - it's in the changelogs for the curious).
>
> I leave the decision making to the maintainers and contribs who do the
> actual work. Hell, I personally don't even use PHP that much these days
> having gotten a job where I focus on writing Cucumber tests in JavaScript
> that run on node.js. I keep up with PHP and this list though cause one
> never knows what the next job will entail. I just dropped out of lurk mode
> to underscore along with others up thread the massive ramifications of what
> is being proposed. As someone who wrote stupid code I can see this
> breaking, tread lightly. And hell, I don't even know how much of that code
> is still in use since I've changed employers many times since it was
> written. This situation is not unique and can create huge headaches for
> companies running projects on legacy code bases.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to