Not disregarding the other answers to your questions, but I believe that maybe one aspect
has been neglected here.
Bill Moseley wrote:
For requests that are chunked (Transfer-Encoding: chunked and no
Content-Length header) calling $r->read returns *unchunked* data from the
socket.
That's indeed handy. Is that mod_perl doing that un-chunking or is it
Apache?
But, it leads to some questions.
First, if $r->read reads unchunked data then why is there a
Transfer-Encoding header saying that the content is chunked? Shouldn't
that header be removed? How does one know if the content is chunked or
not, otherwise?
The real question is : does one need to know ?
The transfer-coding is something that even an intermediate HTTP proxy may
be allowed to change, for reasons to do with transport of the request along a section of
the network path.
It should be entirely transparent to the application receiving the data.
Second, if there's no Content-Length header then how does one know how much
data to read using $r->read?
One answer is until $r->read returns zero bytes, of course.
Indeed. That means that the end of *this* request body has been encountered.
But, is
that guaranteed to always be the case, even for, say, pipelined requests?
It should be, because $r concerns the present request being processed.
If there is another request pipelined onto that same connection, it is a separate request
and a different $r.
My guess is yes because whatever is de-chunking the request knows to stop
after reading the last chunk, trailer and empty line. Can
anyone elaborate on how Apache/mod_perl is doing this?
I can't really, but it should be done by something at some fairly low level. It should be
the *first* thing which happens to the request body, before any request-level body access
is allowed.
(Similarly, at the response level, "chunking" a response body should be the last thing
happening before the request is put on the wire out.)
Perhaps I'm approaching this incorrectly, but this is all a bit untidy.
I'm using Catalyst and Catalyst needs a Content-Length.
I would posit then that Catalyst is wrong (or not compatible with HTTP 1.1 in
that respect).
So, I have a Plack
Middleware component that creates a temporary file writing the buffer from
$r->read( my $buffer, 64 * 1024 ) until that returns zero bytes. I pass
this file handle onto Catalyst.
So what you wrote then is a patch to Catalyst.
Then, for some content-types, Catalyst (via HTTP::Body) writes the body to *
another* temp file. I don't know how Apache/mod_perl does its
de-chunking, but I can call $r->read with a huge buffer length and Apache
returns that. So, maybe Apache is buffering to disk, too.
In other words, for each tiny chunked JSON POST or PUT I'm creating two (or
three?) temp files which doesn't seem ideal.
I realise that my comments above don't really help you in your specific predicament, but I
just felt that it was good to put things back in their place, particularly that at the $r
(request) level, you should not have to know if the request came in chunked or not.
And that if a client sends a request with a chunked body, you are not necessarily gettting
it so on the server on the which application runs. And vice-versa.