On Sat, Jan 1, 2022 at 10:47 PM Kirill Nesmeyanov <n...@xakep.ru> wrote:
> > >Суббота, 1 января 2022, 17:41 +03:00 от Rowan Tommins < > rowan.coll...@gmail.com>: > > > >On 31/12/2021 00:21, Kirill Nesmeyanov wrote: > >> I support this behavior fix because in its current form, due to a > similar problem (almost?), all PSR-7 implementations contain bugs that > violate RFC7230 (section 3.2: > https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, > physically, by the standard, all headers can have the name "0" (like «0: > value»), but when stored inside implementations, it is converted to a > string and a problem arises ($message->getHeaders() // > returns array<int|string, string> instead of array<string, string>). > > > >You appear to be technically correct - the RFC defines a header name > >only as "token", which implies the following would all be valid HTTP > >headers: > > > >42: The Answer > >!: Bang > >^_^: Surprised > > > >In practice, it would be a bad idea to use any of these. > > > >Every single one of the field names registered with IANA [1] starts with > >a letter, and proceeds with only letters, digits, and hyphen ('-'). [The > >exception is "*", listed there as "reserved" to specifically prevent its > >use conflicting with the wild-card value in "Vary" lists.] > > > >I'm actually surprised this definition hasn't been updated with > >interoperability advice in recent revisions of the standard. I did find > >this general advice for internet message headers in RFC 3864 [2]: > > > > > Thus, for maximum flexibility, header field names SHOULD further be > > > restricted to just letters, digits, hyphen ('-') and underscore ('_') > > > characters, with the first character being a letter or underscore. > > > >The additional restriction on underscore ('_') in HTTP arises from CGI, > >which maps headers to environment variables. For instance, Apache httpd > >silently drops headers with anything other than letters, digits, and > >hyphen [3] to avoid security issues caused by environment manipulation. > > > >If I was developing a PSR-7 or similar library, I would be inclined to > >drop any header composed only of digits, and issue a diagnostic warning, > >so that it wouldn't escalate to a type error later. It certainly doesn't > >seem reasonable to change the entire language to work around that > >inconvenience. > > > >[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml > >[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1 > >[3] https://httpd.apache.org/docs/trunk/env.html#setting > > > >Regards, > > > >-- > >Rowan Tommins > >[IMSoP] > > > >-- > >PHP Internals - PHP Runtime Development Mailing List > >To unsubscribe, visit: https://www.php.net/unsub.php > > I just gave an example of what at the moment can cause an exception in any > application that is based on the PSR. It is enough to send the header "0: > Farewell to the server". In some cases (for example, as is the case with > RoadRunner) - this can cause a physical stop and restart of the server. > > Just in case, I will repeat my thesis: I cannot imagine that anyone is > using this functionality consciously and that it is part of the real logic > of the application. You don't have a lot of experience with legacy code then. PHP, particularly old PHP (like 4, 5.1 era) was used by a lot of idiots. I was one of those idiots (Perhaps I still am an idiot - jury is deliberating on that but I digress). Snark aside though, PHP has more than its fair share of self taught programmers (again, not trying to be insulting as I am one myself), and they do things with the code that veterans and formally trained programmers would never think to try, let alone implement. I guarantee fixing how key handling is done will break something - either in the form of code exploiting the weird behavior, or code that is guarding against the weird behavior; not to mention any tests that might be written - though amateurs rarely write test code (again, speaking from past experience I've grown beyond). > And fixing this behavior, I believe, will automatically fix many libraries > (not necessarily PSR) that do not take this behavior into account. > > And blow up who knows how many old code bases - many of which don't have unit test suites to discover if there is a break ahead of time. This is the sort of BC break that would cause a cliff of users unable to migrate to the major version that implements it. A Python 2 vs. 3 style of break. Even with that all said it may indeed be worth fixing - but this will require the same sort of kid gloves approach removing register globals had (for the newer folks, there was a time when $_REQUEST["var"] would auto populate $var with lovely security snarls). IIRC PHP 3 had register globals always on, 4 created a config toggle to turn them off, and PHP 5.0 turned that toggle off by default, finally PHP 5.3 (6 without unicode more or less) removed support for register globals entirely (My memory could be off - it's in the changelogs for the curious). I leave the decision making to the maintainers and contribs who do the actual work. Hell, I personally don't even use PHP that much these days having gotten a job where I focus on writing Cucumber tests in JavaScript that run on node.js. I keep up with PHP and this list though cause one never knows what the next job will entail. I just dropped out of lurk mode to underscore along with others up thread the massive ramifications of what is being proposed. As someone who wrote stupid code I can see this breaking, tread lightly. And hell, I don't even know how much of that code is still in use since I've changed employers many times since it was written. This situation is not unique and can create huge headaches for companies running projects on legacy code bases.