On Sun, 2 Jan 2022 at 06:20, Michael Morris <tendo...@gmail.com> wrote: > > On Sat, Jan 1, 2022 at 10:47 PM Kirill Nesmeyanov <n...@xakep.ru> wrote: > > > > > >Суббота, 1 января 2022, 17:41 +03:00 от Rowan Tommins < > > rowan.coll...@gmail.com>: > > > > > >On 31/12/2021 00:21, Kirill Nesmeyanov wrote: > > >> I support this behavior fix because in its current form, due to a > > similar problem (almost?), all PSR-7 implementations contain bugs that > > violate RFC7230 (section 3.2: > > https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, > > physically, by the standard, all headers can have the name "0" (like «0: > > value»), but when stored inside implementations, it is converted to a > > string and a problem arises ($message->getHeaders() // > > returns array<int|string, string> instead of array<string, string>).
The solution is to cast the keys back to string when reading from the array, IF the type matters. foreach ($headers as $k => $values) { $name = (string) $k; } We could introduce an alternative to array_keys() that would do this automatically, e.g. "array_keys_str()". > > > > > >You appear to be technically correct - the RFC defines a header name > > >only as "token", which implies the following would all be valid HTTP > > >headers: > > > > > >42: The Answer > > >!: Bang > > >^_^: Surprised > > > > > >In practice, it would be a bad idea to use any of these. > > > > > >Every single one of the field names registered with IANA [1] starts with > > >a letter, and proceeds with only letters, digits, and hyphen ('-'). [The > > >exception is "*", listed there as "reserved" to specifically prevent its > > >use conflicting with the wild-card value in "Vary" lists.] > > > > > >I'm actually surprised this definition hasn't been updated with > > >interoperability advice in recent revisions of the standard. I did find > > >this general advice for internet message headers in RFC 3864 [2]: > > > > > > > Thus, for maximum flexibility, header field names SHOULD further be > > > > restricted to just letters, digits, hyphen ('-') and underscore ('_') > > > > characters, with the first character being a letter or underscore. > > > > > >The additional restriction on underscore ('_') in HTTP arises from CGI, > > >which maps headers to environment variables. For instance, Apache httpd > > >silently drops headers with anything other than letters, digits, and > > >hyphen [3] to avoid security issues caused by environment manipulation. > > > > > >If I was developing a PSR-7 or similar library, I would be inclined to > > >drop any header composed only of digits, and issue a diagnostic warning, > > >so that it wouldn't escalate to a type error later. It certainly doesn't > > >seem reasonable to change the entire language to work around that > > >inconvenience. > > > > > >[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml > > >[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1 > > >[3] https://httpd.apache.org/docs/trunk/env.html#setting > > > > > >Regards, > > > > > >-- > > >Rowan Tommins > > >[IMSoP] > > > > > >-- > > >PHP Internals - PHP Runtime Development Mailing List > > >To unsubscribe, visit: https://www.php.net/unsub.php > > > > I just gave an example of what at the moment can cause an exception in any > > application that is based on the PSR. It is enough to send the header "0: > > Farewell to the server". In some cases (for example, as is the case with > > RoadRunner) - this can cause a physical stop and restart of the server. > > > > Just in case, I will repeat my thesis: I cannot imagine that anyone is > > using this functionality consciously and that it is part of the real logic > > of the application. It is not really relevant weather this is used _consciously_. > > > You don't have a lot of experience with legacy code then. PHP, particularly > old PHP (like 4, 5.1 era) was used by a lot of idiots. > > I was one of those idiots (Perhaps I still am an idiot - jury is > deliberating on that but I digress). We don't need to assume incompetence. Any code that deals with arrays _must_ consider this behavior, unless the array keys are known to be only integers, or only non-integer-like strings. One obvious BC break: What would be the value of $a in the following snippet? $a = []; $a['5] = 's'; $a[5] = 'n'; $a[7] = 'n'; $a['7'] = 's'; Currently it would be [5 => 'n', 7 => 's']. With the "new" behavior, we'd have to decide what happens. Can keys '5' and 5 coexist? ['5' => 's', 5 => 'n', 7 => 'n', '7' => 's']? Or would assignment change the key type? [5 => 'n', '7' => 's']? Or does the initial key type remain, and only the value changes? ['5' => 'n', 7 => 's']? I would argue that the current behavior might still be the best we can get for a general-purpose structure that can act as a vector or a map or a mix of both. The perceived awkwardness is just a result of trying to do everything at once. Possible solutions: - Dedicated array-reading methods that cast all keys to string on read. - New structures, alternative to array, that either allow separate entries for 5 and '5', or that are restricted to one key type. --- Andreas > > Snark aside though, PHP has more than its fair share of self taught > programmers (again, not trying to be insulting as I am one myself), and > they do things with the code that veterans and formally trained programmers > would never think to try, let alone implement. > > I guarantee fixing how key handling is done will break something - either > in the form of code exploiting the weird behavior, or code that is guarding > against the weird behavior; not to mention any tests that might be written > - though amateurs rarely write test code (again, speaking from past > experience I've grown beyond). > > > > > And fixing this behavior, I believe, will automatically fix many libraries > > (not necessarily PSR) that do not take this behavior into account. > > > > > > And blow up who knows how many old code bases - many of which don't have > unit test suites to discover if there is a break ahead of time. This is > the sort of BC break that would cause a cliff of users unable to migrate to > the major version that implements it. A Python 2 vs. 3 style of break. > > Even with that all said it may indeed be worth fixing - but this will > require the same sort of kid gloves approach removing register globals had > (for the newer folks, there was a time when $_REQUEST["var"] would auto > populate $var with lovely security snarls). IIRC PHP 3 had register > globals always on, 4 created a config toggle to turn them off, and PHP 5.0 > turned that toggle off by default, finally PHP 5.3 (6 without unicode more > or less) removed support for register globals entirely (My memory could be > off - it's in the changelogs for the curious). > > I leave the decision making to the maintainers and contribs who do the > actual work. Hell, I personally don't even use PHP that much these days > having gotten a job where I focus on writing Cucumber tests in JavaScript > that run on node.js. I keep up with PHP and this list though cause one > never knows what the next job will entail. I just dropped out of lurk mode > to underscore along with others up thread the massive ramifications of what > is being proposed. As someone who wrote stupid code I can see this > breaking, tread lightly. And hell, I don't even know how much of that code > is still in use since I've changed employers many times since it was > written. This situation is not unique and can create huge headaches for > companies running projects on legacy code bases. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php