Re: Compaction Strategy guidance

Andrei Ivanov Tue, 25 Nov 2014 11:41:22 -0800

Ah, clear then. SSD usage imposes a different bias in terms of costs;-)


On Tue, Nov 25, 2014 at 9:48 PM, Nikolai Grigoriev <[email protected]> wrote:
> Andrei,
>
> Oh, yes, I have scanned the top of your previous email but overlooked the
> last part.
>
> I am using SSDs so I prefer to put extra work to keep my system performing
> and save expensive disk space. So far I've been able to size the system more
> or less correctly so these LCS limitations do not cause too much troubles.
> But I do keep the CF "sharding" option as backup - for me it will be
> relatively easy to implement it.
>
>
> On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov <[email protected]> wrote:
>>
>> Nikolai,
>>
>> Just in case you've missed my comment in the thread (guess you have) -
>> increasing sstable size does nothing (in our case at least). That is,
>> it's not worse but the load pattern is still the same - doing nothing
>> most of the time. So, I switched to STCS and we will have to live with
>> extra storage cost - storage is way cheaper than cpu etc anyhow:-)
>>
>> On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev <[email protected]>
>> wrote:
>> > Hi Jean-Armel,
>> >
>> > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
>> > there
>> > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
>> > 2.0.10.
>> >
>> > I have about 1,8Tb of data per node now in total, which falls into that
>> > range.
>> >
>> > As I said, it is really a problem with large amount of data in a single
>> > CF,
>> > not total amount of data. Quite often the nodes are idle yet having
>> > quite a
>> > bit of pending compactions. I have discussed it with other members of C*
>> > community and DataStax guys and, they have confirmed my observation.
>> >
>> > I believe that increasing the sstable size won't help at all and
>> > probably
>> > will make the things worse - everything else being equal, of course. But
>> > I
>> > would like to hear from Andrei when he is done with his test.
>> >
>> > Regarding the last statement - yes, C* clearly likes many small servers
>> > more
>> > than fewer large ones. But it is all relative - and can be all
>> > recalculated
>> > to $$$ :) C* is all about partitioning of everything - storage,
>> > traffic...Less data per node and more nodes give you lower latency,
>> > lower
>> > heap usage etc, etc. I think I have learned this with my project.
>> > Somewhat
>> > hard way but still, nothing is better than the personal experience :)
>> >
>> > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce <[email protected]>
>> > wrote:
>> >>
>> >> Hi Andrei, Hi Nicolai,
>> >>
>> >> Which version of C* are you using ?
>> >>
>> >> There are some recommendations about the max storage per node :
>> >>
>> >> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>> >>
>> >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
>> >> handle 10x
>> >> (3-5TB)".
>> >>
>> >> I have the feeling that those recommendations are sensitive according
>> >> many
>> >> criteria such as :
>> >> - your hardware
>> >> - the compaction strategy
>> >> - ...
>> >>
>> >> It looks that LCS lower those limitations.
>> >>
>> >> Increasing the size of sstables might help if you have enough CPU and
>> >> you
>> >> can put more load on your I/O system (@Andrei, I am interested by the
>> >> results of your  experimentation about large sstable files)
>> >>
>> >> From my point of view, there are some usage patterns where it is better
>> >> to
>> >> have many small servers than a few large servers. Probably, it is
>> >> better to
>> >> have many small servers if you need LCS for large tables.
>> >>
>> >> Just my 2 cents.
>> >>
>> >> Jean-Armel
>> >>
>> >> 2014-11-24 19:56 GMT+01:00 Robert Coli <[email protected]>:
>> >>>
>> >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev
>> >>> <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> One of the obvious recommendations I have received was to run more
>> >>>> than
>> >>>> one instance of C* per host. Makes sense - it will reduce the amount
>> >>>> of data
>> >>>> per node and will make better use of the resources.
>> >>>
>> >>>
>> >>> This is usually a Bad Idea to do in production.
>> >>>
>> >>> =Rob
>> >>>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Nikolai Grigoriev
>> > (514) 772-5178
>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178

Re: Compaction Strategy guidance

Reply via email to