On Mon, Aug 20, 2012 at 12:19 PM, Anssi Kääriäinen
<[email protected]>wrote:

>
>
> On 20 elo, 18:39, Tai Lee <[email protected]> wrote:
> > I'd like to re-visit the discussion surrounding #7581 [1], a ticket about
> > streaming responses that is getting quite long in the tooth now, which
> > Jacob finally "accepted" 11 months ago (after a long time as DDN) and
> said
> > that it is clear we have to do *something*, but *what* remains to be
> seen.
> >
> > I'd like to try provide a little refresher and summarise the options that
> > have been suggested, and ask any core devs to please weigh in with their
> > preference so that I can work up a new patch that will be more likely to
> > gain acceptance.
> >
> > THE PROBLEM:
> >
> > 1. There are bugs and surprising behaviour that arise when creating an
> > HttpResponse with a generator as its content, as a result of "quantum
> > state" of `HttpResponse.content` (measuring it changes it).
> >
> >
> >
> > >>> from django.http import HttpResponse
> > >>> r = HttpResponse(iter(['a', 'b']))
> > >>> r.content
> > 'ab'
> > >>> r.content
> > ''
> > >>> r2 = HttpResponse(iter(['a', 'b']))
> > >>> print r2.content
> > ab
> > >>> print r2.content
> > >>> r3 = HttpResponse(iter(['a', 'b']))
> > >>> r3.content == r3.content
> >
> > False
> >
> > 2. Some middleware prematurely consume generator content by accessing
> > `HttpResponse.content`, which can use a lot of memory and cause browser
> > timeouts when attempting to stream large amounts of data or
> > slow-to-generate data.
> >
> > There have been several tickets [2] [3] [4] and django-developers
> > discussions [5] [6] [7] [8] about these issues.
> >
> > SOME USE CASES FOR STREAMING RESPONSES:
> >
> > A. Generating and exporting CSV data directly from the database.
> >
> > B. Restricting file access to authenticated users for files that may be
> > hosted on external servers.
> >
> > C. Drip-feeding chunks of content to prevent timeout when requesting a
> page
> > that takes a long time to generate.
> >
> > OPTION 1:
> >
> > Remove support for "streaming" responses. If an iterator is passed in as
> > content to `HttpResponse`, consume it in `HttpResponse.__init__()` to
> > eliminate buggy behaviour. Middleware won't have to worry about what type
> > of content a response has.
> >
> > Now that Jacob has accepted #7581 and said that it is clear we need to do
> > *something*, I hope we can rule out this option.
> >
> > OPTION 2:
> >
> > Make `HttpResponse.__init__()` consume any iterator content, and add an
> > `HttpResponseStreaming` class or an `HttpResponse(streaming=False)`
> > argument. Allow middleware to check via `hasattr()` or `isinstance()`
> > whether or not the response has generator content, and conditionally skip
> > code that is incompatible with streaming responses.
> >
> > Some middleware will have to be updated for compatibility with streaming
> > responses, and any 3rd party middleware that prematurely consumes
> generator
> > content will continue to work, only without the bugs (and potentially
> with
> > increased memory usage and browser timeouts).
> >
> > OPTION 3:
> >
> > Build a capabilities API for `HttpResponse` objects, and have middleware
> > inspect responses to determine "can I read content?", "can I replace
> > content?", "can I change etag?", etc. This will likely become a bigger
> and
> > more complicated design decision as we work out what capabilities we want
> > to support. Some have argued that it should be sufficient to know if we
> > have generator content or not, for all the cases that people have
> reported
> > so far.
> >
> > OPTION 4:
> >
> > Provide a way for developers to specify on an `HttpResponse` object or
> > subclass that specific middleware should be skipped for that response.
> This
> > would be problematic because 3rd party views won't know what other
> > middleware is installed in a project in order to name them for exclusion.
> >
> > OPTION 5:
> >
> > Add Yet Another Setting that would allow developers to define
> > `CONDITIONAL_MIDDLEWARE_CLASSES` at a project level. At the project
> level,
> > developers would know which middleware classes they are using, and when
> > they should be executed or skipped. This would give very fine grained
> > control at a project level to match middleware conditionally with
> > `HttpResponse` subclasses, without requiring any changes to existing or
> 3rd
> > party middleware. This could look something like this:
> >
> > MIDDLEWARE_CLASSES = (
> >     'django.middleware.common',
> > )
> >
> > CONDITIONAL_MIDDLEWARE_CLASSES = {
> >     'exclude': {
> >         'django.http.HttpResponseStreaming': ['django.middleware.common',
> > 'otherapp.othermiddleware', ...],
> >     },
> >     'include': {
> >         'myapp.MyHttpResponse': ['myapp.mymiddleware', ...],
> >     },
> >
> > }
>
> My initial feeling is that anything accessing "large content" in
> Python is broken and we should not try to make sure these things
> continue to work. The problem is how we define if the content is too
> large to access. Assuming any generator passed to the __init__ is
> large maybe too drastic.
>
> We should either remove the content attribute altogether from
> generator based responses, or at least raise an error if the content
> is accessed a second time. The current behaviour is definitely broken,
> and we should not silently return corrupt data, instead we should
> complain loudly.
>
> To me option 5 feels like we are trying to work around the symptoms
> instead of solving the underlying problem.
>
>  - Anssi
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>
>
My inclination is that we should kill .content entirely, middleware that
wants to rewrite the response will have two choices:

1) explicitly evaluate the entire response, and return a new HttpResponse
object
2) return s anew HttpResponse object that has a generator that iteratively
evaluates the previous one and rewrite it incrementally.

Alex

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to