[Twisted-Python] Streaming HTTP

Cory Benfield Fri, 13 Nov 2015 04:39:02 -0800

Folks,

# Problem Statement


Thanks for your feedback on my HTTP/2 questions. I’ve started work implementing 
a spike of a HTTP/2 protocol for twisted.web. I’m aiming to have something that 
works in at least some cases by the end of the day.

As part of my dive into twisted.web, I noticed something that surprised me: it 
seems to have no support for ‘streaming’ request bodies. By this I mean that 
the Request.requestReceived() method is not actually called until the complete 
request body has been received. This is a somewhat unexpected limitation for 
Twisted: why should I have to wait until the entire body has been uploaded to 
start doing things with it?

This problem is thrown into sharp relief with HTTP/2, which essentially always 
chunks the body, even if a content-length is provided. This means that it is 
now very easy to receive data in delimited chunks, which an implementation may 
want to have semantic meaning. However, the request is unable to access this 
data in this way. It also makes it impossible to use a HTTP/2 request/response 
pair as a long-running communication channel, as we cannot safely call 
requestReceived until the response is terminated (which also terminates the 
HTTP/2 stream).

Adi pointed me at a related issue, #6928[0], which itself points at what 
appears to be an issue tracking exactly this request. That issue is issue 
#288[1], which is 12 years old(!). This has clearly been a pain point for quite 
some time.

Issue #6928 has glyph suggesting that we come to the mailing list to discuss 
this, but the last time it was raised no responses were received[2]. I believe 
that with HTTP/2 on the horizon, this issue is more acute than it was before, 
and needs solving if Twisted is going to continue to remain relevant for the 
web. It should also allow people to build more performant web applications, as 
they should be able to handle how the data queues up in their apps.

This does not immediately block my HTTP/2 work, so we can take some time and 
get this right.

# Proposed Solution

To help us move forward, I’m providing a proposal for how I’d solve this 
problem. This is not necessarily going to be the final approach, but is instead 
a straw-man we can use to form the basis of a discussion about what the correct 
fix should be.

My proposal is to deprecate the current Request/Resource model. It currently 
functions and should continue to function, but as of this point we should 
consider it a bad way to do things, and we should push people to move to a 
fully asynchronous model.

We should then move to an API that is much more like the one used by Go: 
specifically, that by default all requests/responses are streamed. Request 
objects (and, logically, any other object that handles requests/responses, such 
as Resource) should be extended to have a chunkReceived method that can be 
overridden by users. If a user chooses not to override that method, the default 
implementation would continue to do what is done now (save to a buffer). Once 
the request/response is complete (marked by receipt of a zero-length chunk, or 
a frame with END_STREAM set, or when the remaining content-length is 0), 
request/responseComplete would be called. For users that did not override 
chunkReceived can now safely access the content buffer: other users can do 
whatever they see fit. We’d also update requestReceived to ensure that it’s 
called when all the *headers* are received, rather than waiting for the body.

A similar approach should be taken with sending data: we should assume that 
users want to chunk it if they do not provide a content-length. An extreme 
position to take (and I do) is that this should be sufficiently easy that most 
users actually *accidentally* end up chunking their data: that is, we do not 
provide special helpers to set content-length, instead just checking whether 
that’s a header users actually send, and if they don’t we chunk the data.

This logic would make it much easier to work with HTTP/2 *and* with WebSockets, 
requiring substantially less special-case code to handle the WebSocket upgrade 
(when the headers are complete, we can spot the upgrade easily).

What do people think of this approach?

Cory


[0]: https://twistedmatrix.com/trac/ticket/6928
[1]: https://twistedmatrix.com/trac/ticket/288
[2]: 
https://twistedmatrix.com/pipermail/twisted-python/2014-February/028069.html

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

[Twisted-Python] Streaming HTTP

Reply via email to