Re: Really need some advices on large data considerations

DuyHai Doan Fri, 16 May 2014 12:40:14 -0700

You can watch this: https://www.youtube.com/watch?v=uoggWahmWYI


 Aaron is discussing about support for big nodes




On Wed, May 14, 2014 at 3:13 AM, Yatong Zhang <[email protected]> wrote:

> Thank you Aaron, but we're planning about 20T per node, is that feasible?
>
>
> On Mon, May 12, 2014 at 4:33 PM, Aaron Morton <[email protected]>wrote:
>
>> We've learned that compaction strategy would be an important point cause
>> we've ran into 'no space' trouble because of the 'sized tiered'  compaction
>> strategy.
>>
>> If you want to get the most out of the raw disk space LCS is the way to
>> go, remember it uses approximately twice the disk IO.
>>
>> From our experience changing any settings/schema during a large cluster
>> is on line and has been running for some time is really really a pain.
>>
>> Which parts in particular ?
>>
>> Updating the schema or config ? OpsCentre has a rolling restart feature
>> which can be handy when chef / puppet is deploying the config changes.
>> Schema / gossip can take a little to propagate with high number of nodes.
>>
>> On a modern version you should be able to run 2 to 3 TB per node, maybe
>> higher. The biggest concerns are going to be repair (the changes in 2.1
>> will help) and bootstrapping. I’d recommend testing a smaller cluster, say
>> 12 nodes, with a high load per node 3TB.
>>
>> cheers
>> Aaron
>>
>>     -----------------
>> Aaron Morton
>> New Zealand
>> @aaronmorton
>>
>> Co-Founder & Principal Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On 9/05/2014, at 12:09 pm, Yatong Zhang <[email protected]> wrote:
>>
>> Hi,
>>
>> We're going to deploy a large Cassandra cluster in PB level. Our scenario
>> would be:
>>
>> 1. Lots of writes, about 150 writes/second at average, and about 300K
>> size per write.
>> 2. Relatively very small reads
>> 3. Our data will be never updated
>> 4. But we will delete old data periodically to free space for new data
>>
>> We've learned that compaction strategy would be an important point cause
>> we've ran into 'no space' trouble because of the 'sized tiered'  compaction
>> strategy.
>>
>> We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderationsand is 
>> this enough or update-to-date? From our experience changing any
>> settings/schema during a large cluster is on line and has been running for
>> some time is really really a pain. So we're gathering more info and
>> expecting some more practical suggestions before we set up  the cassandra
>> cluster.
>>
>> Thanks and any help is of great appreciation
>>
>>
>>
>

Re: Really need some advices on large data considerations

Reply via email to