> Recently I found out Django doesn't support multiple databases. That's > quite surprising. > > Given that limitation, how do you scale out a Django app?
Depends on where your bottleneck(s) is/are. It also depends heavily on your read/write usage pattern. If you're truly experiencing a stenosis of the database connection, you have several options, but most of them reside in domain specific tuning. > Without multi-DB support, most of the usual techniques for scaling out > such as: > - DB sharding > - functional partitioning - eg. separate DB servers for user > profiles, orders, and products > would be infeasible with django. Sharding and functional partitioning don't yet exist in stock Django. There's a GSoC project that may make some headway on "multiple database support", but I've not heard anything further on the Django Developers regarding that. > I know replication is still available. But that still means all data > must fit in 1 server. Well, with bountiful storage using things like AoE, SAS, SAN, FC, etc, having "all the data fit in one server" isn't a horrible issue. And with 1TB drives on the market, fitting multiple TB in a single machine isn't a disastrous idea. If you have more data than will fit in a single machine, you have a lot of other issues and will likely have to get very specific (and likely expensive ;-) help. > Also replication isn't going to help update performance. This goes back to my "read/write usage pattern" quip...if you have a high volume of reads, and a low volume of writes, replication is one of the first tools you reach for. However, with a high volume of writes, you've entered the realm of "hard problems". Usually if you app reaches this volume of DB traffic, you need a solution specialized to your domain, so stock Django may not be much help. Given that you've not detailed the problem you're actually having (this is where profiling comes in), it's hard to point much beyond the generic here. So answers to some questions might help: - are you bringing back huge datasets or just small sub-slices of your data? - are you updating large swaths of data at a time, or are you just updating single records most of the time? - are just a few select users doing the updating, and all the rest of your users are doing piles of reads? - how big is this hypothetical DB of yours? - can you partition by things that totally do not relate, such as by customer, so each customer can have their own instance that then gets put wherever your admins define letting DNS balance the load? (a'la BaseCamp's customername.basecamp.com) - can you tolerate replication delays? what time-frame? (sub-second? async taking up to 30 minutes? a whole day?) - how readily can you cache things to prevent touching the database to begin with? Can you cache with an HTTP proxy font-end for repeated pages? Can you cache datasets or other fragments with memcached? If your web-app follows good design, any GET can be cached based on a subset of its headers. Lastly, read over David Cramer's blog[1] as he's done some nice work scaling Django to big deployments and has some helpful tips. -tim [1] http://www.davidcramer.net/category/code/django --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---