I'd also add that one of the biggest complications to arise from having
multiple clusters is that read biased client applications would need to be
aware of all clusters and either aggregate result sets or involve logic to
choose the right cluster based on a particular query.

And from a more operational perspective, I think you'd have a tough time
find monitoring applications (like Opscenter) that would support multiple
clusters within the same viewport.  Having used multiple clusters in the
past, I can definitely tell you that from an administrative, operational,
and development standpoint, one cluster is almost definitely better than
many.

Oh and I'm positive that there are other cassandra deployments out there
with well beyond 100 nodes so I don't thinking you're really treading on
dangerous ground here.

I'd definitely say that you should try to use a single cluster if possible.

On Fri, Jan 20, 2012 at 9:34 PM, Maxim Potekhin <potek...@bnl.gov> wrote:

> You can also scale not "horizontally" but "diagonally",
> i.e. raid SSDs and have multicore CPUs. This means that
> you'll have same performance with less nodes, making
> it far easier to manage.
>
> SSDs by themselves will give you an order of magnitude
> improvement on I/O.
>
>
>
> On 1/19/2012 9:17 PM, Thorsten von Eicken wrote:
>
>> We're embarking on a project where we estimate we will need on the order
>> of 100 cassandra nodes. The data set is perfectly partitionable, meaning
>> we have no queries that need to have access to all the data at once. We
>> expect to run with RF=2 or =3. Is there some notion of ideal cluster
>> size? Or perhaps asked differently, would it be easier to run one large
>> cluster or would it be easier to run a bunch of, say, 16 node clusters?
>> Everything we've done to date has fit into 4-5 node clusters.
>>
>
>

Reply via email to