RE: The state of per-site/per-view middleware caching in Django

Kääriäinen Anssi Fri, 21 Oct 2011 08:05:46 -0700

I do not know nearly enough about caching to participate fully in this 
discussion. But it strikes me that the attempt to have CSRF protected anonymous 
page cached is not that smart. If you have an anonymous submittable form, why 
bother with CSRF protection? I mean, what is it protecting against? Making 
complex arrangements in the caching layer for this use case seems like wasted 
effort. Or am I missing something obvious?


The following is from the stupid ideas department: Maybe there could be a 
"reverse cache" template tag, such that you would mark the places where you 
want changing content as non-cacheable. You would need two views for this, one 
which would construct the "base content" and then another which would construct 
the dynamic parts. Something like:

page_cached.html:
... expensive to generate content ...
{% block "login_logout" non_cacheable %}
{% endblock %}
... expensive to generate content ...

You would generate the base page by a cached render view:

def page_view_cached(request, id):
    if cached(id):
        return cached_content
    else:
        ... expensive queries ...
        return cached_render("page_cached.html", context, ...)

The above view would not be directly usable at all, you would need to use a 
wrapper view which would render the non-cacheable parts:

def page_view(request, id):
    # Below would return quickly from cache most of the time
    cached_portions = page_view_cached(request, id)
    return render_to_response("page.html", context={cached: cached_portions, 
user:request.user})

where page.html would be:
{% extends cached %}
{% block login_logout %}
{% if user.is_authenticated %}
    Hello, user!
{% else %}
    <a href="login.html">login</a>
{% endif %}
{% endblock %}

That seems to be what is really wanted in this situation. The idea is quite 
simply to extend the block syntax to caching. A whole another issue is how to 
make this easy enough to be actually usable, and fast enough to be actually 
worth it.

 - Anssi

________________________________________
From: [email protected] [[email protected]] 
On Behalf Of Jim Dalton [[email protected]]
Sent: Friday, October 21, 2011 16:02
To: [email protected]
Subject: Re: The state of per-site/per-view middleware caching in Django

On Oct 20, 2011, at 6:02 PM, Carl Meyer wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Jim,
>
> This is a really useful summary of the current state of things, thanks
> for putting it together.
>
> Re the anonymous/authenticated issue, CSRF token, and Google Analytics
> cookies, it all boils down to the same root issue. And Niran is right,
> what we currently do re setting Vary: Cookie is what we have to do in
> order to be correct with respect to HTTP and upstream caches. For
> instance, we can't just remove Vary: Cookie from unauthenticated
> responses, because then upstream caches could serve that unauthenticated
> response to anyone, even if they are actually authenticated.
>
> Currently the Django page caching middleware behaves pretty much just
> like an upstream cache in terms of the Vary header. Apart from the
> CACHE_MIDDLEWARE_ANONYMOUS_ONLY setting, it just looks at the response,
> it doesn't make use of any additional "inside information" about what
> your Django site did to generate that response in order to decide what
> to cache and how to cache it.
>
> This approach is pretty attractive, because it's conceptually simple,
> consistent with upstream HTTP caching, and conservative (quite unlikely
> to serve the wrong cached content).
>
> It might be possible to make it "smarter" in certain cases, and allow it
> to cache more aggressively than an upstream cache can. #9249 is one
> proposal to do this for cookies that aren't used on the server, either
> via explicit setting or (in a recently-added proposal) via tracking
> which cookie values are accessed. If we did that, plus special-cased the
> session cookie if the user is unauthenticated and the session isn't used
> outside of contrib.auth, I think that could possibly solve the
> unauthenticated-users and GA issues.
>
> However, this (especially the latter) would come with the cost of making
> the cache middleware implementation more fragile and coupled to other
> parts of the framework. And it still doesn't help with CSRF, which is a
> much tougher nut to crack, because every response for pages using CSRF
> come with a Set-Cookie header and probably with a CSRF token embedded in
> the response content; and those both mean that response really can't be
> re-used for anyone else. (Getting rid of the token embedded in the HTML
> means forms couldn't ever POST without JS help, which is not an option
> as the documented default approach). You can mark some form-using views
> that are available to anonymous users as csrf-exempt, which exposes you
> potentially to CSRF-based spam, but isn't a security issue if you aren't
> treating authenticated submissions any differently from
> non-authenticated ones.
>
> Generally, I come down on the side of skepticism that introducing these
> special cases into the cache middleware really buys enough to be worth
> the added complexity (though I could be convinced that #9249 is worth it).

Thanks Carl. This is definitely a good, clarifying response to what I was 
mulling around about.

A few thoughts of my own to add here:

* You and Nihan are certainly right about upstream caches. Regardless of what 
we do here, we'll have to vary by cookie in the response header. This makes 
sense for a site that offers authentication: Django needs to check on every 
page view to see if the user is authenticated, so we can't have the upstream 
cache holding on to a page for us.

* Agreed about how the "smartness" comes at the cost of brittleness if the 
implementations are too tightly coupled. That said, I can squint and sort of 
see an implementation that could thread the needle here. It would require 
something like:

- An API in the cache middleware instructing it to ignore certain cookies for 
the purposes of caching (i.e. something along the lines of #9249).

- Some kind of "pre-fetch" hook in the cache middleware. Whether it's a flag in 
the request object, a signal or something else, give other systems the ability 
to look at a request before it hits the FetchFromCacheMiddleware and either 
allow or prevent the response from being pulled from the cache. E.g if there 
was a flag request.invalidate_cache that defaults to False, the contrib.auth 
app could, in combination with the above, pull the session id from 
consideration in the cache key and do an authentication check on its own, 
invalidating the cache on its own if the user is authenticated. The core idea 
is what you already suggested, I'm more illustrating here that this can 
conceivably be implemented as an API, making it less brittle.

- Some kind of "post-fetch" hook in the cache middleware, combined with a 
retooling of the CSRF middleware. This is getting in the clouds here a bit, but 
a hook on the opposite end of the fetch operation could allow the CSRF app to 
add its token after the response was pulled from the cache. I say we're in the 
clouds here because for something like this to work the CSRF would have to do a 
little two-step dance. Before the UpdateCache step the CSRF would had to insert 
something that looked like a server-side template tag, which gets cached, and 
then after that step the CSRF would have to insert it's actual value. On the 
fetch side, the CSRF would have to make use of the post fetch hook to pull the 
cached paged rendered with the server side template tag thingy and then add the 
correct value on its way out the door. Essentially, we're talking about a poor 
man's two phase rendering system.

This barely qualifies as a thought exercise let alone a proposal, but my main 
underlying suggestion here is that if the cache middleware correctly 
implemented hooks of some kind in the right locations, it might well be 
possible for systems like auth and CSRF to do what they would need to do 
without coupling all these systems together in a giant ball of twine.


> I do think we should improve the cache middleware documentation so its
> limitations are outlined more clearly upfront, and point people towards
> existing solutions for caching mostly-but-not-entirely-anonymous pages:
> edge-side-includes, two-phase-render, and JS/AJAX fetch.
>
> #15855, on the other hand, is a bug that really does need to be fixed. I
> still don't see a better fix than the one I outlined in the ticket
> description: requiring some middleware to be in MIDDLEWARE_CLASSES for
> the cache_page decorator to work, and not doing the actual caching until
> we hit that middleware. Or alternatively, adding an implicit "cache any
> responses that had cache_page used on them" phase to response
> processing, after all middleware. I think those are both ugly fixes,
> though; maybe someone has a better idea. The last time I know of that
> this was discussed in-depth was in
> http://groups.google.com/group/django-developers/browse_frm/thread/f96e982254fbe5c3/2b02361fd6e706f4
>
> Carl

My thinking right now as far as moving forward:

1. Fixing #9249 and #15855. I hear your philosophical concerns about #9249 but 
the ubiquity of Google Analytics means we must do fine some way to fix it 
(IMO). Addressing these two tickets would at least ensure page caching wasn't 
actually broken. I'll try to jump in on those if I have time later next week. 
#9249 in particular seems quite close.

2. Clarifying the documentation. I think an admonition in the page caching 
section of the docs which outlined the present challenges a developer might 
face implementing it would probably have done the trick for me when I was first 
glancing at it. I can open a ticket on that next week, again if I have time.

It'd be great if these two got in for 1.4.

3. Addressing the other stuff is I guess for now a sort of "some day" goal. I 
continue to feel strongly that it's a worthy goal, particularly given that CSRF 
and contrib.auth are such fundamental parts of most projects and that they 
really are the only two things that stand in the way of page caching being a 
viable option in many projects. If anyone else gets inspired by this goal let 
me know, otherwise I'm content for the time being to let it stew.

Thanks all for listening.

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

RE: The state of per-site/per-view middleware caching in Django

Reply via email to