Hi Aymeric, I'm thinking of proposing an alternative to the cached loader. This new approach makes Django faster in general.
To start, I put together some benchmarks here: https://github.com/prestontimmons/templatebench The goal was to identify where Django spends it's time. Is it the loaders that are slow? The parsing? The rendering? Something else? Here are some basic timings from my Macbook air. This is the cumulative time to run 1000 iterations: Instantiating a basic template, i.e. Template("hello"): 0.0344369411 Parsing a complex template with extends and includes: 0.3044617176 Unsurprising so far, but the time for parsing measurably grows as the template has more to parse. Running get_template with a simple template, like "hello": 0.1308078766 Running get_template on a complex template: 0.4068300724 With a simple template, more time is spent finding the template than parsing it. As template contents grow, though, the parsing time far outgrows the template loading time. Running get_template on a template with 200 includes: 12.2357971668 Here's a classic case where Django bombs. The parsing time really adds up. Time to render a basic template: 0.0240666866 Time to render a complex template: 0.1018106937 In this case, the rendering of a complex template takes four times more than a simple template. This is compared to a 10 times increase in parsing time from the previous benchmark. A chunk of this time is also parsing, though, due to extends and include nodes. All in all, the parsing time grows much quicker than the render time does. Based on these benchmarks, I've come to believe most of the time in Django templates is spent on parsing, not on loading templates or rendering. The cached loader is effective because it removes the need to reparse templates more than once. Interesting enough, Jinja2 has different results: Running get_template with a simple template, like "hello": 0.0112802982 Running get_template with a complex template: 0.0122888088 Even complex templates make little difference in parsing time for Jinja2. Running get_template on a template with 200 includes: 0.0110247135 Many includes don't make a difference. Time to render a basic template: 0.0134618282 Time to render a complex template: 0.0217206478 For a complex template, Jinja2 rendering is about 50% faster than Django. Even so, the overall time difference is small since rendering is quick anyway. After digging into Jinja2, I think this is because the Jinja2 environment keeps an internal cache of templates. If a template is in the cache, it calls the template "uptodate" method. If "uptodate" is true, the cached template is used. For filesystem loaders, this incurs a filesystem hit each time, but that's fine. File system calls aren't the bottleneck. Parsing is. With that, I wondered if we couldn't do something similar in Django. I made an experimental commit here, based on my branch: https://github.com/prestontimmons/django/commit/4683300c60033ba0db472c81244f01ff932c6fb3 This adds internal caching to django.template.engine.Engine and to the extends node. It also adds an "uptodate" method to the template origin so templates are reparsed when modified. This is different than the cached loader, which never checks if templates are changed. That means it's also viable in development. Running get_template with a simple template, like "hello": Before: 0.1308078766 After: 0.0192785263 Jinja2: 0.0112802982 Running get_template on a complex template: Before: 0.4068300724 After: 0.0204186440 Jinja2: 0.0122888088 By parsing only when necessary these benchmarks see a 10-20x speed up. Running get_template on a template with 200 includes: Before: 12.2357971668 After: 0.0179648399 Using include many times is now an option. So far, all the tests pass, and I've been testing with other templates. The implementation seems almost too easy for the increase in speed. Granted there's not a dealbreaker I haven't noticed yet, I'd like to propose that we follow Jinja2's example by adding internal caching in place of the cache loader. It has a nice speed increase and simplifies things for my recursive loader branch as well. There is one risk I can think of. External template tags can store state on the Node instance rather than context.render_context. This is warned against in the docs and is not thread-safe. In practice though, if the cached loader isn't used, a developer could be unaware that they have a problem at all. Switching to an internal cache would cause those to be revealed. Even so, I think that can be handled with documentation. Do you think it's worth making an attempt to formalize this? Preston -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/cf32aaea-933a-4dc4-b7d1-2700d41a15d1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
