Re: Performance testing in Cassandra

2014-09-09 Thread Umang Shah
Hi Malay , you can do below things, cassandra stress tool is inside /tools/bin/cassandra-stress for performing inserts and reads to test a keyspace to measure performace cassandra-stress [options] [-o [operation name]] -o (--operation name) : INSERT,READ,ETC.. (default INSERT) -t (--threads) :

Performance testing in Cassandra

2014-09-09 Thread Malay Nilabh
Hi Anyone can you please let me know the steps for performance testing in Cassandra using Stress tools. Regards, Malay Nilabh BIDW BU/ Big Data CoE L&T Infotech Ltd, Hinjewadi,Pune [cid:image001.gif@01CFCCE7.400E62B0]: +91-20-66571746 [cid:image002.png@01CFCCE7.400E62B0]+91-73-879-00727 Email: m

Re: Quickly loading C* dataset into memory (row cache)

2014-09-09 Thread DuyHai Doan
Rob Coli strikes again, you're Doing It Wrong, and he's right :D Using Cassandra as an distributed cache is a bad idea, seriously. Putting 6GB into row cache is another one. On Tue, Sep 9, 2014 at 9:21 PM, Robert Coli wrote: > On Tue, Sep 9, 2014 at 12:10 PM, Danny Chan wrote: > >> Is there a

Re: Questions about cleaning up/purging Hinted Handoffs

2014-09-09 Thread Rahul Menon
I use jmxterm. http://wiki.cyclopsgroup.org/jmxterm/ attach it to your c* process and then use the org.apache.cassandra.db:HintedHandoffManager bean and run deleteHintsforEndpoint to drop hints for each ip. On Wed, Sep 10, 2014 at 3:37 AM, Rahul Neelakantan wrote: > RF=3, two DCs. (Going to

Re: hardware sizing for cassandra

2014-09-09 Thread James Briggs
Regarding what Netflix does, the last time I checked: 1) sure, they use AWS VMs, but they take the whole machine. So is that really using a VM? :) 2) they use SSD mainly to reduce compaction time. "We don't even notice it with SSD any more." When sizing nodes and clusters, the main factors I've

Re: cassandra on own distributed network

2014-09-09 Thread James Briggs
What you're describing depends on the load (data size) and latency. Doing a bootstrap or backup would require a fair amount of bandwidth if you want it done quickly with a lot of data. Also, latency would be very high going over some kind of office VPN. But there's no reason you can't do what you'

cassandra on own distributed network

2014-09-09 Thread David M
Hi everyone I am at a loss for locating use cases/examples/documentation/books/etc for deploying Cassandra where multi-dc nodes of a single cluster are on your own network at points around the world. In my example a Cassandra dc equates to a building. Of interest to me is how installations are in

Re: Atomic batch of counters in Cassandra 2.1

2014-09-09 Thread Robert Coli
On Tue, Sep 9, 2014 at 2:36 PM, Eugene Voytitsky wrote: > As I understand, atomic batch for counters can't work correctly > (atomically) prior to 2.1 because of counters implementation. > [Link: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2] > > Cassandra 2.1. reimplements the

Re: Questions about cleaning up/purging Hinted Handoffs

2014-09-09 Thread Rahul Neelakantan
RF=3, two DCs. (Going to 1.2.x in a few weeks) What's the procedure to drop via JMX? - Rahul 1-678-451-4545 (US) +91 99018-06625 (India) On Sep 9, 2014, at 9:23 AM, Rahul Menon wrote: > Yep, the hinted handoff in 1.0.8 is abysmal at best. What is your replication > facter, i have had huge hi

Re: Cassandra JBOD disk configuration

2014-09-09 Thread James Briggs
I've used JBOD before and here's the operational problems I noticed: 1) each volume/disk fills at a different rate, so the min might be 100 GB data, and the max might be 200 GB.That means you cannot use anywhere near your real hard disk capacity. (Then on top of that compaction requires space.) 2

Recommended read/write consistency level for counters

2014-09-09 Thread Eugene Voytitsky
What is recommended read/write consistency level (CL) for counters? Yes I know that write_CL + read_CL > RF is recommended. But, I got strange results when run my junit tests with different CLs against 3 nodes cluster. I checked 9 combinations: (write=ONE,QUORUM,ALL) x (read=ONE,QUORUM,ALL) Ea

Re: hardware sizing for cassandra

2014-09-09 Thread Robert Coli
On Tue, Sep 9, 2014 at 2:16 PM, Russell Bradberry wrote: > Because RAM is expensive and the JVM heap is limited to 8gb. While you do >> get benefit out of using extra RAM as page cache, it's often not cost >> efficient to do so > > > Again, this is so use-case dependent. I have met several people

Atomic batch of counters in Cassandra 2.1

2014-09-09 Thread Eugene Voytitsky
As I understand, atomic batch for counters can't work correctly (atomically) prior to 2.1 because of counters implementation. [Link: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2] Cassandra 2.1. reimplements the counters. Will atomic batch of counters work as expected (atomic

Re: Counters: consistency, atomic batch

2014-09-09 Thread Eugene Voytitsky
Thanks, good article. But some of my questions are still unanswered. I will reformulate and post them as short separate emails. On 05.09.14 01:01, Ken Hancock wrote: Counters are way more complicated than what you're illustrating. Datastax did a good blog post on this: http://www.datastax.com/

Re: hardware sizing for cassandra

2014-09-09 Thread Russell Bradberry
> > Because RAM is expensive and the JVM heap is limited to 8gb. While you do > get benefit out of using extra RAM as page cache, it's often not cost > efficient to do so Again, this is so use-case dependent. I have met several people that run small nodes with fat ram to get it all in memory to s

Re: hardware sizing for cassandra

2014-09-09 Thread Russell Bradberry
*TL;DR* There is no one recommended setup for Cassandra, everyone's use-case is different and it is up to you to figure out the best setup for your use-case. There are a lot of questions that need to be asked before making a decision on hardware layout. There is just so

Re: hardware sizing for cassandra

2014-09-09 Thread Robert Coli
On Tue, Sep 9, 2014 at 1:07 PM, Rahul Neelakantan wrote: > Why not more than 32gb of RAM/node? > Because RAM is expensive and the JVM heap is limited to 8gb. While you do get benefit out of using extra RAM as page cache, it's often not cost efficient to do so. =Rob

RE: hardware sizing for cassandra

2014-09-09 Thread Arindam Barua
>From my experience, and what I've read, more the RAM the better. Any excess >memory can be used as disk cache, which should help with your reads a lot. -Arindam -Original Message- From: Paolo Crosato [mailto:paolo.cros...@targaubiest.com] Sent: Tuesday, September 09, 2014 12:53 PM To:

Re: hardware sizing for cassandra

2014-09-09 Thread Rahul Neelakantan
Why not more than 32gb of RAM/node? Rahul Neelakantan > On Sep 9, 2014, at 3:52 PM, Paolo Crosato > wrote: > > Every node should have at least 4 cores, with a maximum of 8. Memory > shouldn't be higher than 32g, 16gb is good for a start. Every node should be > a phisical machine, not a virtu

RE: hardware sizing for cassandra

2014-09-09 Thread Paolo Crosato
Every node should have at least 4 cores, with a maximum of 8. Memory shouldn't be higher than 32g, 16gb is good for a start. Every node should be a phisical machine, not a virtual one, or at least a virtual machine with an ssd hd subsystem. The disk subsystem should be directly connected to the

Re: Moving Cassandra from EC2 Classic into VPC

2014-09-09 Thread Ben Bromhead
On 9 Sep 2014, at 7:33 am, Nate McCall wrote: > Other thoughts: > - Go slowly and verify that clients and gossip are talking to the new nodes > after each lift and shift > - Don't forget to change seeds afterwards > - This is not the time to upgrade/change *anything* else - match the version >

Re: hardware sizing for cassandra

2014-09-09 Thread Chris Lohfink
It depends. Ultimately your load is low enough a single node can probably handle it so you kinda want a "minimum" cluster. Different people have different thoughts on what this means - I would recommend 5-6 nodes with a 3 replication factor. (say m1.xlarge, or c3.2xlarge striped ephemerals, I

Re: Quickly loading C* dataset into memory (row cache)

2014-09-09 Thread Robert Coli
On Tue, Sep 9, 2014 at 12:10 PM, Danny Chan wrote: > Is there a method to quickly load a large dataset into the row cache? > I use row caching as I want the entire dataset to be in memory. > You're doing it wrong. Use a memory store. =Rob

Re: Cassandra JBOD disk configuration

2014-09-09 Thread Chris Lohfink
It can get really unbalanced with STCS. Whats more is even if there was a disk that could fit the 600gb sstable it doesn't pay attention to space (first) so may pick the 75% full one over the 10% one. Its a better idea to use LCS with it unless data model really needs it in which case monitor

Quickly loading C* dataset into memory (row cache)

2014-09-09 Thread Danny Chan
Hello all, Is there a method to quickly load a large dataset into the row cache? I use row caching as I want the entire dataset to be in memory. I'm running a Cassandra-1.2 database server with a dataset of 555 records (6GB size) and a row cache of 6GB. Key caching is disabled and I am using

Re: hardware sizing for cassandra

2014-09-09 Thread Nate Payne
I would also love to see any resources that are shared describing best practices. If you find something Oleg, or others have some very useful resources outside of what I have found by searching online, I would be very grateful if these were shared in my direction. Cheers. Nate On Tue, Sep 9, 2014

hardware sizing for cassandra

2014-09-09 Thread Oleg Ruchovets
Hi , Where can I find the document with best practices about sizing for cassandra deployment? We have 1000 writes / reads per second. record size 1k. Questions: 1) how many machines do we need? 2) how many ram ,disc size / type? 3) What should be network? I understand that hardware

Re: Moving Cassandra from EC2 Classic into VPC

2014-09-09 Thread Janne Jalkanen
Alain Rodriguez outlined this procedure that he was going to try, but failed to mention whether this actually worked :-) https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E /Janne On 8 Sep 2014,

Re: Moving Cassandra from EC2 Classic into VPC

2014-09-09 Thread Nate McCall
We've done this several times with clients - Ben's response will work and is pretty close to the approaches we took: > > Use the gossiping property file snitch in the VPC data centre. > Agree. I don't think you could even do this effectively with the EC2Snitch. Use a public elastic ip for each

Re: Questions about cleaning up/purging Hinted Handoffs

2014-09-09 Thread Rahul Menon
Yep, the hinted handoff in 1.0.8 is abysmal at best. What is your replication facter, i have had huge hints pile up, where i had to drop the entire coloumn family and then run a repair. Either that or you can use the JMX HintedHandoffManager and delete hints per endpoint. Also it maybe worthwhile t