Hi Jean-Armel, I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but there are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra 2.0.10.
I have about 1,8Tb of data per node now in total, which falls into that range. As I said, it is really a problem with large amount of data in a single CF, not total amount of data. Quite often the nodes are idle yet having quite a bit of pending compactions. I have discussed it with other members of C* community and DataStax guys and, they have confirmed my observation. I believe that increasing the sstable size won't help at all and probably will make the things worse - everything else being equal, of course. But I would like to hear from Andrei when he is done with his test. Regarding the last statement - yes, C* clearly likes many small servers more than fewer large ones. But it is all relative - and can be all recalculated to $$$ :) C* is all about partitioning of everything - storage, traffic...Less data per node and more nodes give you lower latency, lower heap usage etc, etc. I think I have learned this with my project. Somewhat hard way but still, nothing is better than the personal experience :) On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce <jaluc...@gmail.com> wrote: > Hi Andrei, Hi Nicolai, > > Which version of C* are you using ? > > There are some recommendations about the max storage per node : > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 > > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to > handle 10x > (3-5TB)". > > I have the feeling that those recommendations are sensitive according many > criteria such as : > - your hardware > - the compaction strategy > - ... > > It looks that LCS lower those limitations. > > Increasing the size of sstables might help if you have enough CPU and you > can put more load on your I/O system (@Andrei, I am interested by the > results of your experimentation about large sstable files) > > From my point of view, there are some usage patterns where it is better to > have many small servers than a few large servers. Probably, it is better to > have many small servers if you need LCS for large tables. > > Just my 2 cents. > > Jean-Armel > > 2014-11-24 19:56 GMT+01:00 Robert Coli <rc...@eventbrite.com>: > >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev <ngrigor...@gmail.com> >> wrote: >> >>> One of the obvious recommendations I have received was to run more than >>> one instance of C* per host. Makes sense - it will reduce the amount of >>> data per node and will make better use of the resources. >>> >> >> This is usually a Bad Idea to do in production. >> >> =Rob >> >> > > -- Nikolai Grigoriev (514) 772-5178