Re: [Twisted-Python] Streaming HTTP

Glyph Lefkowitz Wed, 02 Dec 2015 14:42:07 -0800

> On Nov 19, 2015, at 3:50 AM, Cory Benfield <c...@lukasa.co.uk> wrote:
> 
> 
>> On 18 Nov 2015, at 12:18, Glyph Lefkowitz <gl...@twistedmatrix.com> wrote:
>> 
> 
> Sorry about the delay in responding to this, but I wanted to make sure I knew 
> at least a bit about what I was talking about before I responded!


Clearly this is a challenging topic that requires lots of thought on the part 
of each interlocutor, and may require long rounds of consideration before each 
reply.  No need to apologize.

>>> What do people think of this approach?
>> 
>> So I think you're roughly on the right track but there are probably some 
>> Twisted-level gaps to fill in.
>> 
>> I've already gestured in the direction of Tubes (as have others) and it's 
>> something to think about.  But before we get to that, let's talk about a 
>> much more basic deficiency in the API: although there's an "IRequest", and 
>> an "IResource", there's no such thing as an "IResponse".  Instead, 
>> "IRequest" stands in for both the request and the response, because you 
>> write directly to a request (implicitly filling out its response as you do 
>> so).
> 
> So, I think in general this is interesting. One of the big difficulties I’m 
> having right now is that I’m trying to combine this “streaming HTTP” work 
> with the implementation of HTTP/2, which means that I need to keep the HTTP/2 
> work in mind whenever I talk about this *and* update the HTTP/2 design in 
> response to decisions we make here. This means I’ve got quite a lot of balls 
> in the air right now, and I am confident I’ll drop quite a few. One thing I’m 
> deliberately not doing here is considering Tubes, in part because I’m 
> extremely concerned about backward compatibility, and want the HTTP/2 work to 
> function in the same environment.
> 
> Unfortunately, this means this conversation is blending into the HTTP/2 one, 
> so I’m going to hijack this thread and bring in some concrete discussion of 
> what I’m working on with the HTTP/2 stuff.

Hijack away.  I think we should be primarily concerned with getting HTTP/2 
integrated for the moment.  The reason this raises so many concerns related to 
the streaming stuff is that the internal implementation of HTTP/2 ought to be 
more amenable to pulling apart to fit into an actually good interface to the 
HTTP protocol.

I think that twisted._threads points in a promising direction for this sort of 
work: let's make the old, crappy HTTP APIs work as-is, but with a new, private 
implementation that is better-factored but not fully documented.  We have the 
old interface as a proof-of-concept, so the new stuff needs to at least be good 
enough to be an internal implementation detail for that; we don't have to 
commit to a new public API to land it, and hopefully with some minor edits we 
can just make it public as the "good" interface (and then backport HTTP/1.1 
over it, since we will probably be dealing with legacy HTTP/1.1 clients and 
servers until we're all dead).

> I was having a conversation about the HTTP/2 architecture on #twisted-dev 
> yesterday, which has led towards my current working approach for HTTP/2, 
> which will be to have two underlying objects. We’ll have H2Connection, which 
> implements IProtocol, and H2Stream, which implements ITransport. These two 
> objects will be *extremely* tightly coupled: H2Stream cannot meaningfully run 
> over an arbitrary transport mechanism, and knows a great deal about how 
> H2Connections work.

This seems good, except for the "extreme" tight coupling.  IProtocol and 
ITransport aren't that tightly coupled.  Why do H2Stream and H2Connection need 
to be?

> The reason we need to take this approach is because IConsumer doesn’t allow 
> for us to have correlators, so even if we only had H2Connection it wouldn’t 
> be able to identify a given producer with the stream it holds. By extension, 
> IConsumer cannot consume multiple producers at once. For this reason, we need 
> an interface between H2Connection and H2Stream that is similar to ITransport 
> and IConsumer, but more featureful. Basically, H2Stream is a thin shim 
> between a producer and H2Connection that adds a stream ID to a few function 
> calls.

This is basically a good pattern.  It exposes a hard-to-screw-up interface to 
the next layer up, because you can't forget to include a (mandatory) stream ID. 
 I've implemented several multiplexing things that work more or less like this.

>> Luckily we have an existing interface that might point the way to a better 
>> solution, both for requests and responses: specifically, the client 
>> IResponse: 
>> https://twistedmatrix.com/documents/15.4.0/api/twisted.web.iweb.IResponse.html.
>> 
>> This interface is actually pretty close to what we want for a server 
>> IResponse as well.  Perhaps even identical.  Its static data is all exposed 
>> as attributes which can be relatively simply inspected, and the way it 
>> delivers a streaming response is that it delivers its body to an IProtocol 
>> implementation (via .deliverBody(aProtocol)).  This is not quite as graceful 
>> as having a .bodyFount() method that returns an IFount from the tubes 
>> package; however, the tubes package is still not exactly mature software, so 
>> we may not want to block on depending on it.  Importantly though, this 
>> delivers all the events you need as a primitive for interfacing with such a 
>> high-level interface; it would definitely be better to add this sort of 
>> interface Real Soon Now, because then the tubes package could simply have a 
>> method, responseToFount (which it will need anyway to work with Agent) that 
>> calls deliverBody internally.
>> 
>> This works as a primitive because you have all the hooks you need for 
>> flow-control.  This protocol receives, to its 'makeConnection' method, an 
>> ITransport which can provide the IProducer 
>> https://twistedmatrix.com/documents/15.4.0/api/twisted.internet.interfaces.IProducer.html
>>  and IConsumer 
>> https://twistedmatrix.com/documents/15.4.0/api/twisted.internet.interfaces.IConsumer.html
>>  interfaces for flow-control.  It receives dataReceived to tell it a chunk 
>> has arrived and connectionLost to tell it the stream has terminated.
> 
> Just let me clarify how this is expected to work. Somewhere we have a 
> t.w.s.Site, which builds some kind of HTTP protocol (currently HTTPChannel, 
> in future some object that can transparently swap between HTTPChannel and 
> H2Connection) when connections are received.

Another option could also be having a t.w.s.NewSite (with that name hopefully 
obviously being a straw man) so that Site can simply be deprecated in favor of 
the new thing.  Making Site itself be able to accommodate the new stuff would 
be nice but is definitely not mandatory.

> These two protocols each build an IGoodRequest, which is very similar to 
> IRequest but has a deliverBody method. The consumer of this (whether 
> IResource or some other thing). These objects, if they want to consume a 
> stream, register a protocol via deliverBody. At this point, H2Connection (via 
> H2Stream) provides itself as the transport to that protocol, and calls 
> deliverBody when chunks of data are received.

This sounds great.  One thing to maybe watch out for: what if nobody calls 
deliverBody?  This can sometimes be a little annoying in client code, to debug 
why a channel is never closed.  Having a nice error in this case would be a 
cherry on top.

> When the object receiving the request is ready to send a response, it 
> calls…something (sendResponse?) and provides an object implementing a server 
> IResponse. The code in the H2Stream/H2Connection sends the headers, then 
> calls deliverBody on the IResponse, passing H2Connection (again via H2Stream) 
> as the protocol that gets called. In this world, H2Stream actually would need 
> to implement IProtocol as well as ITransport.

A minor bit of critique here: the Single Responsibility Principle 
<https://en.wikipedia.org/wiki/Single_responsibility_principle> dictates that 
we ought not to have H2Stream literally implement both IProtocol and 
ITransport; rather, we should have an _H2StreamProtocol and an 
_H2StreamTransport, since the thing talking to the IProtocol implementation 
really ought to be wholly distinct from the thing talking to the ITransport 
implementation, and this kind of duality makes it very easy for users - 
especially programmers new to Twisted - to get confused.  As Nathaniel Manista 
and Augie Fackler put it in The Talk 
<https://www.youtube.com/watch?v=3MNVP9-hglc>, we want to express ourselves 
"structurally", if you only want application code to talk to the transport 
implementation and it's an error to talk to the protocol implementation, pass 
only the transport implementation.

> Is my understand of that correct? If so, I think this design can work: 
> essentially, H2Stream becomes the weird intermediary layer that appears as 
> both a transport and a protocol to the request/response layer. Underneath the 
> covers it mostly delegates to H2Connection, which implements a slightly 
> weirdo version of IConsumer (and in fact IProducer) that can only be consumed 
> by H2Stream.

I don't quite get why it needs to be slightly weirdo (hopefully IPushProducer 
is sufficient?) but yes, this all sounds right to me.

> 
>> Unfortunately the client IRequest 
>> https://twistedmatrix.com/documents/15.4.0/api/twisted.web.iweb.IClientRequest.html
>>  isn't quite as useful (although its relative minimalism should be an 
>> inspiration to anyone designing a next-generation IRequest more than the 
>> current IRequest's sprawling kitchen-sink aesthetic).  However, 
>> IResponse.deliverBody could be applied to IGoodRequest as well.  If we have 
>> a very similar-to-IResponse shaped IRequest object, say with 'method', 'uri' 
>> and 'headers', and then a 'deliverBody' that delivers the request body in 
>> much the same way, we could get a gracefully structured streaming request 
>> with works with a lot of existing code within Twisted.
>> 
>> Then the question is: what to do with IResource?
>> 
>> Right now the flow of processing a request is, roughly:
>> 
>> -> wait for full request to arrive
>>  -> have HTTPChannel fill out IRequest object
>> -> look at request.site.resource for the root
>> *-> call getChildWithDefault repeatedly, mutating "cursor" state on the 
>> IRequest as you move (specifically: "prepath" and "postpath" attributes)
>>  -> eventually reach the leaf Resource, or one with 'isLeaf' set on it, and 
>> delegate producing the response to that resource
>> *-> call resource.render(request)
>> -> examine the return value; if it's bytes, deliver them and close the 
>> connection; NOT_DONE_YET, just leave the connection open,
>> 
>> Instead, I think a good flow would be:
> 
> [snip long discussion of how to write locateChild]
> 
> Agreed that these proposed approaches would work well. I have no concrete 
> feedback on them, they seem good to me.
> 
>> -> finally, call .responseForRequest(request) -> IResponse on the final 
>> Resource and deliver the IResponse to the network.
>> 
>> The way compatibility could be achieved here is to write a wrapper that 
>> would implement .responseForRequest to first collect the entire body, then 
>> synthesize a gross old-style-IRequest-like object out of the combination of 
>> that body and the other information about the resource, then call 
>> .getChildWithDefault on it a few times, then call the old-style .render_GET, 
>> et. al.  The IResponse returned from this compatibility .responseForRequest 
>> would wrap up calls like request.write and turn them into write() calls.
> 
> This seems super-gross but vaguely do-able, and we’ll need to write it in 
> order to get the new H2Connection/H2Stream objects working with the old 
> paradigm anyway.

"super-gross but vaguely do-able" is what we're shooting for in the 
compatibility layer :).

> All of this approach sounds reasonable modulo some careful thinking about how 
> exactly we tie this in with the old paradigm. I’m particularly concerned 
> about H2Channel, which I suspect many applications may know a great deal 
> about. Changing its interface is likely to be slightly tricky, but we’ll see 
> how it goes.

It might be useful to think about a parent interface, IHTTPChannel with all the 
least-common-denominator stuff on it, and sub-interfaces IHTTP1_1Channel and 
IHTTP2_0Channel which each derive from that and provide additional 
version-specific stuff.  I don't have enough protocol-specific knowledge to 
hand in short-term memory to comment on what that functionality might be though.

-glyph

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Re: [Twisted-Python] Streaming HTTP

Reply via email to