Hi Jean-Armel,

I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
there are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
2.0.10.

I have about 1,8Tb of data per node now in total, which falls into that
range.

As I said, it is really a problem with large amount of data in a single CF,
not total amount of data. Quite often the nodes are idle yet having quite a
bit of pending compactions. I have discussed it with other members of C*
community and DataStax guys and, they have confirmed my observation.

I believe that increasing the sstable size won't help at all and probably
will make the things worse - everything else being equal, of course. But I
would like to hear from Andrei when he is done with his test.

Regarding the last statement - yes, C* clearly likes many small servers
more than fewer large ones. But it is all relative - and can be all
recalculated to $$$ :) C* is all about partitioning of everything -
storage, traffic...Less data per node and more nodes give you lower
latency, lower heap usage etc, etc. I think I have learned this with my
project. Somewhat hard way but still, nothing is better than the personal
experience :)

On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce <jaluc...@gmail.com> wrote:

> Hi Andrei, Hi Nicolai,
>
> Which version of C* are you using ?
>
> There are some recommendations about the max storage per node :
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>
> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
> handle 10x
> (3-5TB)".
>
> I have the feeling that those recommendations are sensitive according many
> criteria such as :
> - your hardware
> - the compaction strategy
> - ...
>
> It looks that LCS lower those limitations.
>
> Increasing the size of sstables might help if you have enough CPU and you
> can put more load on your I/O system (@Andrei, I am interested by the
> results of your  experimentation about large sstable files)
>
> From my point of view, there are some usage patterns where it is better to
> have many small servers than a few large servers. Probably, it is better to
> have many small servers if you need LCS for large tables.
>
> Just my 2 cents.
>
> Jean-Armel
>
> 2014-11-24 19:56 GMT+01:00 Robert Coli <rc...@eventbrite.com>:
>
>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev <ngrigor...@gmail.com>
>> wrote:
>>
>>> One of the obvious recommendations I have received was to run more than
>>> one instance of C* per host. Makes sense - it will reduce the amount of
>>> data per node and will make better use of the resources.
>>>
>>
>> This is usually a Bad Idea to do in production.
>>
>> =Rob
>>
>>
>
>


-- 
Nikolai Grigoriev
(514) 772-5178

Reply via email to