On 29 Dec 2005, at 20:29, Jacob Kaplan-Moss wrote:

I've always though that this particular -- and common -- use case should be delegated to the DB level using one of the many excellent replication/distribution tools for your database. For example, you could easily do read distribution with pg_pool or sqlrelay, and it would be transparent to Django. I don't see a good reason to tackle replication in Django itself as that's more or less a solved problem.

I disagree. There's a lot more to separate databases than just replication - when you scale big there are all kinds of ways things might need to be partitioned. You might want to keep "cheap" data (like traffic logs for user's weblogs) on a different DB cluster from expensive data (like their blog entries themselves). Some data gets accessed all the time while some other data is only ever written - etc etc.

I'd love Django to have a reputation as the web framework that scales. As far as I can tell, big LAMP sites that scale are mostly done as PHP with a whole load of custom scary stuff - connections to multiple databases, memcached, even XMLRPC calls to backend services written in Java. We already have caching and we can do calls to backend services easily but the single database connection assumption is baked right in to the framework.

Unfortunately, I don't have the experience of scaling big to say much more than that. This is where input from people like Scott becomes invaluable :)

Cheers,

Simon

Reply via email to