If I have a database that partitions naturally into non-overlapping datasets, in which there are no references between datasets, where each dataset is quite large (i.e. large enough to merit its own cluster from the point of view of quantity of data), should I set up one cluster per database or one large cluster for everything together?
As I see it: The primary advantage of separate clusters is total isolation: if I have a problem with one dataset, my application will continue working normally for all other datasets. The primary advantage of one big cluster is usage pooling: when one server goes down in a large cluster it's much less important than when one server goes down in a small cluster. Also, different temporal usage patterns of the different datasets (i.e. there will be different peak hours on different datasets) can be combined to ease capacity requirements. Any thoughts?