And hit the dev@ list this time...
On Thu, Jun 16, 2011 at 16:19, Greg Stein <gst...@gmail.com> wrote: > On Thu, Jun 16, 2011 at 02:38, Lieven Govaerts <svn...@mobsol.be> wrote: >>... >>> (here for review; more code to follow...) >> >> Interesting patch. > > Was hoping you'd have time to look at it. Thanks! > >>... >>> + /* If "pausing" is to be allowed, then this must refer to the >>> connection's >>> + bucket allocator. It will be used to hold content in memory, then >>> + return it once it has been consumed. >>> + >>> + NOTE: if this field is NULL, then pausing is NOT allowed. */ >>> + serf_bucket_alloc_t *pause_alloc; >>> + >>> + /* If pausing is allowed, then callbacks can set this value to pause >>> + the XML parsing. Note that additional elements may be parsed once >>> + this value is set (as the current buffer is consumed; no further >>> + buffers will be parsed until PAUSED is cleared). >>> + >>> + At some point, the callbacks should clear this value. The main >>> + serf_context_run() loop will notice the change and resume delivery >>> + of content to the XML parser. */ >>> + svn_boolean_t paused; >> >> If delivery is resumed from say disk, it might be 'a while' before all >> cached data is consumed and svn is reading from the socket again. To >> avoid dropped connections, we might design it so that even while the >> XML parser is not paused, svn still continues to read from the socket >> and adds the data to the memory/disk cache. > > Ah. Good point. > > I suspect that we'll generally be okay because the parser will paused > once N requests get queued. And reaching N could be below the server > timeout. > > I was certainly planning to consume from memory and disk, and append > if more stuff arrived. > > Boy, it would be nice to have multiple threads :-) > >>> + /* While the XML parser is paused, content arriving from the server >>> + must be saved locally. We cannot stop reading, or the server may >>> + decide to drop the connection. The content will be stored in memory >>> + up to a certain limit, and will then be spilled over to disk. >>> + >> Yes, this is the idea. I have been thinking of putting this >> functionality in a bucket, e.g. a cache_bucket. Besides the callback >> it fits the bucket model quite well. >> >> The benefits of such a bucket are improved locality of the code, but >> also that you can put this cache_bucket in the chain before the >> deflate_bucket, so that serf can stored the incoming data compressed >> into memory, possibly not even requiring disk access. > > I saw this from your patch, but didn't think that it made sense. > > We need to read from the network and put that "somewhere" while the > parser is paused. But what that means is you have a read/write bucket, > which I disfavor. To draw a picture: > > read <- C <- N > > where C is your Cache bucket, and N is the Network bucket. But when > the parser is paused, you have: > > C <- write <- read <- N > > That just doesn't look like a typical stack of buckets. I think the > ideal bucket scenario is "build them once, then read-only." A bucket > can certainly generate "more" data internally when a read request > occurs. That is just dynamically fulfilling its logical content. But > altering the content represented over time feels icky. > > Thus, I set up the pending_t structure. That structure would be quite > useful in other areas of our codebase, but (until needed) I made it > local to ra_serf for now. > >>... >>> +/* As chunks of content arrive from the server, and we need to hold them >>> + in memory (because the XML parser is paused), they are copied into >>> + these buffers. The buffers are arranged into a linked list. */ >>> +struct pending_buffer_t { >>> + apr_size_t size; >>> + char data[PARSE_CHUNK_SIZE]; >>> + struct pending_memnode_t *next; >>> +}; >> >> Why not store this in an aggregate bucket? It has to be wrapped in a >> bucket anyway to return it to the caller. You can still calculate the >> size of the current buffer outside the aggregate bucket. > > I don't need the content in bucket form. My plan is to "read" from > pending_t and call inject_to_parser(). This will likely be performed > in the code block that calls serf_context_run(). The response handler > for the network will append to the pending structures while the parser > is paused or when pending data exists. When the parser is active, the > response handler will inject its content (as the code is structured > today). IOW, the response handler will not read from the pending data. > > Does the above sound reasonable? > > Cheers, > -g >