On Sun, Jun 8, 2008 at 2:11 AM, lgr888999 <[EMAIL PROTECTED]> wrote: > how you would build a huge decentraliced system. Now of course it > would depend on what the purpose of the system is so lets just take > twitter as an example. :) >
It's easy. All you have to do is to avoid all single points of failure and all possible bottlenecks. Just that ;-) Now, in practice this is *very* complicated. Have an example of a pretty simple website with 3 classic tiers - webservers, app logic and database backend. The first bottleneck and point of failure is the path to reach your internet-facing servers. This is relatively easily avoidable with acquiring your own "portable" block of IP addresses (PA) and have multiple paths to the wide net through independent ISPs. Provided you have your datacenters in multiple locations you'll get pretty reliable access to your service for most of the internet. Another major bottleneck is indeed the database. Unless you look as high as Google or Yahoo are with their custom replicated/redundant DB solutions you'll probably end up with some sort of SQL backend. You shouldn't aim for having access to all DB updates from all connected clients immediately, in no time. It helps a lot if you could identify "clouds" of objects that must appear to work synchronously and the rest that may get updated when its time comes. For instance - a twitter user that posts a message must be able to see it immediately on his page. Otherwise he'll ge confused. On the other hand whether his friends can see it in 1 secs or 1 minute is not that important in most cases. Objects directly related to one user's session are obviously in the "synchronous cloud", others are in "async cloud" and it's not that critical that one session has immediate access to other sessions' clouds. The importance of this separation comes up once you have to deal with multiple geographically distant datacenters (DCs). You can have a DB cluster in each of them (Oracle RAC, MySQL NDB, or something similar) and then you'll have to design replication strategies between the datacenters. This is probably one of the most difficult parts of application design. You must ensure the replication is resilient against things like conflicting updates (since transactions won't work over multiple DCs) leading e.g. to duplicate keys. And there's much more. Some things will require a "global ack" from all DCs worldwide before they could be committed, e.g. registration of new user must ensure that the same one is not being registered at the same time somewhere else. OTOH Things like currently logged-in users and their session information may not need to be replicated elsewhere at all. These tend to be high-volume things and often are better treated differently from "real content". Luckily for you most user sessions will send all requests to just one DC because of quite stable routing paths in the internet. However it may happen that a user starts a session talking to DC1 and after a while transfers to DC2. In that case you can require re-login or, better and more user-friendly, request his session data from his "home" DC. As you can see it's not that much about Django or the web application to build a distributed scalable website. The core part is the datastore management. All the above comes from my experience with operational management of a major news-site with three distinct datacenters on two continents with millions page views a day. Indeed our setup is much more complex with different subsystems having their specific requirements but to share some hints the above simplification is sufficient. Hope that helps ;-) JDJ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---