Hi internals

A while ago I encountered a limitation of how RFC1867 requests are
handled in PHP. PHP populates the $_POST and $_FILES superglobals when
the Content-Type is multipart/form-data or
application/x-www-form-urlencoded, but only when the method is POST.
For application/x-www-form-urlencoded PUT requests this is not a
problem because the format is simple, usually limited in size and PHP
offers functions to parse it, namely parse_str and parse_url. For
RFC1867 it's a different story.

The code handling the request will need to use streams because RFC1867
is often used with files, the format is much more complicated, files
should be cleaned up when the request ends if unused, etc. Handling
this manually is non-trivial. This has been reported many years ago,
and evidently caused a bit of frustration.
https://bugs.php.net/bug.php?id=55815

This is not limited to PUT either, multipart/form-data bodies are
valid with other requests. Here's the approach I believe is best.

Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals. The function works for any non-POST requests. It assumes
that none of the input stream has been consumed, and that the
Content-Type is set accordingly. A nice side-effect of this approach
is that it may be used with the enable_post_data_reading ini setting
to decide whether to parse the RFC1867 bodies dynamically. For
example, a specific endpoint may accept bigger requests. The function
may be implemented in a more generic way 1. by returning the
data/files arrays instead of populating the superglobals and 2. by
providing an input stream manually. I don't know if there's such a
use-case and thus if this is worthwhile, as it would require bigger
changes in the RFC1867 handling.

Here's the proof-of-concept implementation:
https://github.com/php/php-src/pull/11472

For completeness, here are other options I considered.

1. Create a new $_PUT superglobal that is always populated. Two
issues: The obvious one is that this is limited to PUT requests. While
we could also introduce $_PATCH, this seems like a poor solution.
While discouraged, other methods can also contain bodies. Another
issue is that the code for processing RFC1867 consumes the input
stream. This constitutes a BC break. Buffering the input is not
feasible for large requests that would be expected here.
2. The same as option 1, but populate the existing $_POST global. This
comes with the same BC break.
3. The same as options 1 or 2 with an additional ini setting to opt
into the behavior. The issue with this approach is that both the old
and new behavior might be desired in different parts of the same
application. The ini option can't be changed at runtime because the
populating of the superglobals happens before user code is being
executed.

Let me know what your thoughts are. If there is consensus in the
feedback I'll update the implementation accordingly and post an update
to the list. If there is no consensus, I will create an RFC.

Ilija

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to