Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
Hi Jack, > So, your 1GB input size means roughly 716 thousand rows of data and 128GB > means roughly 92 million rows, correct? Yes, that's correct. > Are your gets and searches returning single rows, or a significant number of > rows? Like I mentioned in my first email, get always returns a s

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
Thanks for that clarification. So, your 1GB input size means roughly 716 thousand rows of data and 128GB means roughly 92 million rows, correct? FWIW, a best practice recommendation is that you avoid using secondary indexes in favor of using "query tables" - store the same data in multiple tables

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
To clarify: Input size is the size of the dataset as a CSV file, before loading it into Cassandra; for each input size, the number of columns is fixed but the number of rows is different. By 1.5KB record, I meant that each row, when represented as a CSV entry, occupies 1500 bytes. I've used the

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The dataset used comprises of ~1.5KB records... there are 105 attributes in each record." Does each test run have exactly the same number of rows and columns and you're just making each column bigger, or what? Cassandra does

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jonathan Haddad
I think you actually get a really useful metric by benchmarking 1 machine. You understand your cluster's theoretical maximum performance, which would be Nodes * number of queries. Yes, adding in replication and CL is important, but 1 machine lets you isolate certain performance metrics. On Thu, J

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Robert Wille
I disagree. I think that you can extrapolate very little information about RF>1 and CL>1 by benchmarking with RF=1 and CL=1. On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal mailto:anur...@berkeley.edu>> wrote: Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassa

Re: Cassandra Performance on a Single Machine

2016-01-13 Thread Anurag Khandelwal
Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassandra as an end-to-end distributed system, but to understand a break down of the performance. For instance, if we understand the performance characteristics that we can expect from a single machine cassandra ins

Re: Cassandra Performance on a Single Machine

2016-01-06 Thread John Schulz
Anurag, Unless you are planning on continuing to use only one machine with RF=1 benchmarking a single system using RF=Consistancy=1 is mostly a waste of time. If you are going to use RF=1 and a single host then why use Cassandra at all. Plain old relational dbs should do the job just fine. Cassan

RE: Cassandra Performance Testing

2014-01-24 Thread Devin Pinkston
John, Yep that makes perfect sense. Thank you for your time I appreciate it! From: John Anderstedt [mailto:john.anderst...@svenskaspel.se] Sent: Friday, January 24, 2014 9:08 AM To: user@cassandra.apache.org Subject: Re: Cassandra Performance Testing It sounds to me that the limitation in this

Re: Cassandra Performance Testing

2014-01-24 Thread John Anderstedt
It sounds to me that the limitation in this setup is the disks. if it’s in a mirror the cost for write’s is the dubble. If you have the flatfile and the db on the same disk there will be a lot of io wait. There is also a question of diskspace and fragmentation, if the flat file occupies 1,2TB o

Re: cassandra performance problems

2013-12-19 Thread Alexander Shutyaev
Thanks all for your responses. We've downgraded from 2.0.3 to 2.0.0 and everything became normal. 2013/12/8 Nate McCall > If you are really set on using Cassandra as a cache, I would recommend > disabling durable writes for the keyspace(s)[0]. This will bypass the > commitlog (the flushing/rota

Re: cassandra performance problems

2013-12-07 Thread Nate McCall
If you are really set on using Cassandra as a cache, I would recommend disabling durable writes for the keyspace(s)[0]. This will bypass the commitlog (the flushing/rotation of which my be a good-sized portion of your performance problems given the number of tables). [0] http://www.datastax.com/do

Re: cassandra performance problems

2013-12-06 Thread J. Ryan Earl
On Thu, Dec 5, 2013 at 6:33 AM, Alexander Shutyaev wrote: > We've plugged it into our production environment as a cache in front of > postgres. Everything worked fine, we even stressed it by explicitly > propagating about 30G (10G/node) data from postgres to cassandra. > If you just want a cachin

Re: cassandra performance problems

2013-12-05 Thread Alexander Shutyaev
Thanks for your answers, Jonathan, yes it was load avg and iowait was lower than 2% all that time - the only load was the user one. Robert, we had -Xmx4012m which was automatically calculated by the default cassandra-env.sh (1/4 of total memory - 16G) - we didn't change that. 2013/12/5 Robert C

Re: cassandra performance problems

2013-12-05 Thread Robert Coli
On Thu, Dec 5, 2013 at 4:33 AM, Alexander Shutyaev wrote: > Cassandra version is 2.0.3. ... We've plugged it into our production > environment as a cache in front of postgres. > https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ > What can be the reason? Can it be high n

Re: cassandra performance problems

2013-12-05 Thread Jonathan Haddad
Do you mean high CPU usage or high load avg? (20 indicates load avg to me). High load avg means the CPU is waiting on something. Check "iostat -dmx 1 100" to check your disk stats, you'll see the columns that indicate mb/s read & write as well as % utilization. Once you understand the bottlenec

Re: Cassandra performance tuning...

2013-07-11 Thread Eric Stevens
You should be able to set the key_validation_class on the column family to use a different data type for the row keys. You may not be able to change this for a CF with existing data without some troubles due to a mismatch of data types; if that's a concern you'll have to create a separate CF and m

Re: Cassandra performance decreases drastically with increase in data size.

2013-06-03 Thread srmore
Thanks all for the help. I ran the traffic over the weekend surprisingly, my heap was doing OK (around 5.7G of 8G) but GC activity went nuts and dropped the throughput. I will probably increase the number of nodes. The other interesting thing I noticed was that there were some objects with finaliz

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread Aiman Parvaiz
I believe you should roll out more nodes as a temporary fix to your problem, 400GB on all nodes means (as correctly mentioned in other mails of this thread) you are spending more time on GC. Check out the second comment in this link by Aaron Morton, he says the more than 300GB can be problematic

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread Bryan Talbot
One or more of these might be effective depending on your particular usage - remove data (rows especially) - add nodes - add ram (has limitations) - reduce bloom filter space used by increasing fp chance - reduce row and key cache sizes - increase index sample ratio - reduce compaction concurrency

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread srmore
You are right, it looks like I am doing a lot of GC. Is there any short-term solution for this other than bumping up the heap ? because, even if I increase the heap I will run into the same issue. Only the time before I hit OOM will be lengthened. It will be while before we go to latest and greate

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-29 Thread Jonathan Ellis
Sounds like you're spending all your time in GC, which you can verify by checking what GCInspector and StatusLogger say in the log. Fix is increase your heap size or upgrade to 1.2: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 On Wed, May 29, 2013 at 11:32 PM, srmore

Re: cassandra performance

2013-03-24 Thread aaron morton
> "select CPUTime,User,site from CF(or tablename) where user=xxx and > Jobtype=xxx" Even thought cassandra has tables and looks like a RDBMS it's not. Queries with multiple secondary index clauses will not perform as well as those with none. There is plenty of documentation here http://www.da

Re: cassandra performance

2013-03-24 Thread Derek Williams
Biggest advantage of Cassandra is it's ability to scale linearly as more nodes are added and it's ability to handle node failures. Also to get the maximum performance from Cassandra you need to be making multiple requests in parallel. On Sun, Mar 24, 2013 at 3:15 AM, 张刚 wrote: > Hello, > I am

Re: cassandra performance

2013-03-24 Thread 张刚
For example,each row represent a job record,it has fields like "user","site","CPUTime","datasize","JobType"... The fields in CF is fixed,just like a table.The query like this "select CPUTime,User,site from CF(or tablename) where user=xxx and Jobtype=xxx" Best regards 2013/3/24 cem > Hi, > > Co

Re: cassandra performance

2013-03-24 Thread cem
Hi, Could you provide some other details about your schema design and queries? It is very hard to tell anything. Regards, Cem On Sun, Mar 24, 2013 at 12:40 PM, dong.yajun wrote: > Hello, > > I'd suggest you to take look at the difference between Nosql and RDMS. > > Best, > > On Sun, Mar 24, 2

Re: cassandra performance

2013-03-24 Thread dong.yajun
Hello, I'd suggest you to take look at the difference between Nosql and RDMS. Best, On Sun, Mar 24, 2013 at 5:15 PM, 张刚 wrote: > Hello, > I am new to Cassandra.I do some test on a single machine. I install > Cassandra with a binary tarball distribution. > I create a CF to store the data that

Re: Cassandra Performance Benchmarking.

2013-01-21 Thread Pradeep Kumar Mantha
Hi, Thanks for the information.. I upgraded my cassandra version to 1.2.0 and tried running the experiment again to find the statistics. My application took nearly 529 seconds for querying 76896 keys. Please find the statistic information below for 32 threads ( where each thread query 76896 key

Re: Cassandra Performance Benchmarking.

2013-01-21 Thread aaron morton
You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network o

Re: Cassandra Performance Benchmarking.

2013-01-18 Thread Tyler Hobbs
The fact that it's still exactly 521 seconds is very suspicious. I can't debug your script over the mailing list, but do some sanity checks to make sure there's not a bottleneck somewhere you don't expect. On Fri, Jan 18, 2013 at 12:44 PM, Pradeep Kumar Mantha wrote: > Hi, > > Thanks Tyler. >

Re: Cassandra Performance Benchmarking.

2013-01-18 Thread Pradeep Kumar Mantha
Hi, Thanks Tyler. Below is the *global* connection pool I am trying to use, where the server_list contains all the ips of 12 DataNodes I am using and pool_size is the number of threads and I just set to timeout to 60 to avoid connection retry errors. pool = pycassa.ConnectionPool('Blast', serve

Re: Cassandra Performance Benchmarking.

2013-01-18 Thread Tyler Hobbs
You just need to increase the ConnectionPool size to handle the number of threads you have using it concurrently. Set the pool_size kwarg to at least the number of threads you're using. On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha wrote: > Thanks Tyler. > > I just moved the pool and cf

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Thanks Tyler. I just moved the pool and cf which store the connection pool and CF information to have global scope. Increased the server_list values from 1 to 4. ( i think i can increase them max to 12 since I have 12 data nodes ) when I created 8 threads using python threading package , I see

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Tyler Hobbs
ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's best to share them across multiple threads. Of course, when you do that, make sure to make the ConnectionPool large enough to support all of the threads making queries concurrently. I'm also not sure if you're just omitting t

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Hi, Thanks. I would like to benchmark cassandra with our application so that we understand the details of how the actual benchmarking is done. Not sure, how easy it would be to integrate YCSB with our application. So, i am trying different client interfaces to cassandra. I found for 12 Data Nod

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Edward Capriolo
Wow you managed to do a load test through the cassandra-cli. There should be a merit badge for that. You should use the built in stress tool or YCSB. The CLI has to do much more string conversion then a normal client would and it is not built for performance. You will definitely get better number

Re: cassandra performance looking great...

2012-09-07 Thread Hiller, Dean
Now that would be cool. Right now though, to many other features need to be added like a GUI on top of the ad-hoc query tool is the next top priority so one can do any S-SQL statement and ad-hoc query the heck out of a noSQL store. We may even be able to optimize our queries to be even faster

Re: cassandra performance looking great...

2012-09-07 Thread Edward Capriolo
Try to get Cassandra running the TPH-C benchmarks and beat oracle :) On Fri, Sep 7, 2012 at 10:01 AM, Hiller, Dean wrote: > So we wrote 1,000,000 rows into cassandra and ran a simple S-SQL(Scalable > SQL) query of > > > PARTITIONS n(:partition) SELECT n FROM TABLE as n WHERE n.numShares >= :low

Re: Cassandra performance question

2012-01-24 Thread Jonathan Ellis
No argument there. Thanks for explaining what you were doing to encrypt client traffic! On Mon, Jan 23, 2012 at 10:11 PM, Chris Marino wrote: > Hi Jonathan, yes, when I say 'node encryption' I mean inter-Cassandra node > encryption. When I say 'client encryption' I mean encrypted traffic from th

Re: Cassandra performance question

2012-01-23 Thread Chris Marino
Hi Jonathan, yes, when I say 'node encryption' I mean inter-Cassandra node encryption. When I say 'client encryption' I mean encrypted traffic from the Cassandra nodes to the clients. For these benchmarks we used the stress test client load generator. We ran test with no encryption, then with 'nod

Re: Cassandra performance question

2012-01-23 Thread Jonathan Ellis
Can you elaborate on to what exactly you were testing on the Cassandra side? It sounds like what this post refers to as "node" encryption corresponds to enabling "internode_encryption: all", but I couldn't guess what your client encryption is since Cassandra doesn't support that out of the box yet

Re: Cassandra performance question

2011-12-31 Thread Dom Wong
sweet, that's pretty awesome :) On Fri, Dec 30, 2011 at 8:08 PM, Jeremy Hanna wrote: > This might be helpful: > http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html > > On Dec 30, 2011, at 1:59 PM, Dom Wong wrote: > > > Hi, could anyone tell me whether this is possible w

Re: Cassandra performance question

2011-12-30 Thread Chris Marino
We did some benchmarking as well. http://blog.vcider.com/2011/09/virtual-networks-can-run-cassandra-up-to-60-faster/ Although we were primarily interested in the networking issues CM On Fri, Dec 30, 2011 at 12:08 PM, Jeremy Hanna wrote: > This might be helpful: > http://techblog.netflix.c

Re: Cassandra performance question

2011-12-30 Thread Jeremy Hanna
This might be helpful: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html On Dec 30, 2011, at 1:59 PM, Dom Wong wrote: > Hi, could anyone tell me whether this is possible with Cassandra using an > appropriately sized EC2 cluster. > > 100,000 clients writing 50k each

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
On Mon, Oct 3, 2011 at 1:19 PM, Ramesh Natarajan wrote: > Thanks for the pointers.  I checked the system and the iostat showed that we > are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and > it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to > ESX

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Chris Goffinet
Yes look at cassandra.yaml there is a section about throttling compaction. You still *want* multi-threaded compaction. Throttling will occur across all threads. The reason being is that you don't want to get stuck compacting bigger files, while the smaller ones build up waiting for bigger compactio

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
Thanks for the pointers. I checked the system and the iostat showed that we are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to ESX host via iSCSI. When I run compactionstats I see we are compact

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Chris Goffinet
Most likely what could be happening is you are running single threaded compaction. Look at the cassandra.yaml of how to enable multi-threaded compaction. As more data comes into the system, bigger files get created during compaction. You could be in a situation where you might be compacting at a hi

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
In order to understand what's going on you might want to first just do write test, look at the results and then do just the read tests and then do both read / write tests. Since you mentioned high update/deletes I should also ask your CL for writes/reads? with high updates/delete + high CL I think

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
I will start another test run to collect these stats. Our test model is in the neighborhood of 4500 inserts, 8000 updates&deletes and 1500 reads every second across 6 servers. Can you elaborate more on reducing the heap space? Do you think it is a problem with 17G RSS? thanks Ramesh On Mon, Oc

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
I am wondering if you are seeing issues because of more frequent compactions kicking in. Is this primarily write ops or reads too? During the period of test gather data like: 1. cfstats 2. tpstats 3. compactionstats 4. netstats 5. iostat You have RSS memory close to 17gb. Maybe someone can give f

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Yang
maybe try row cache ? have you enabled the mlock ? (need jna.jar , and set ulimit -l ) using iostat -x would also give you more clues as to disk performance On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote: > I am running a cassandra cluster of  6 nodes running RHEL6 virtualized by > ESX

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
We have 5 CF. Attached is the output from the describe command. We don't have row cache enabled. Thanks Ramesh Keyspace: MSA: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] Column Families: ColumnFamily: admin

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote: > I am running a cassandra cluster of  6 nodes running RHEL6 virtualized by > ESXi 5.0.  Each VM is configured with 20GB of ram and 12 cores. Our test > setup performs about 3000  inserts per second.  The cassandra data partition > is on a X

Re: Cassandra performance

2010-09-20 Thread Edward Capriolo
On Sat, Sep 18, 2010 at 9:26 AM, Peter Schuller wrote: >>  - performance (it should be not as much less than shard of MySQL and >> scale linearly, we want to have not more that 10K inserts per second >> of writes, and probably not more than 1K/s reads which will be mostly >> random) >>  - ability

Re: Cassandra performance

2010-09-18 Thread Peter Schuller
>  - performance (it should be not as much less than shard of MySQL and > scale linearly, we want to have not more that 10K inserts per second > of writes, and probably not more than 1K/s reads which will be mostly > random) >  - ability to store big amounts of data (now it looks that we will > hav

Re: Cassandra performance

2010-09-18 Thread Kamil Gorlo
Hi, first of all I am not Cassandra hater :) I do not expect miracles also :) I'm searching if there is any scalable solution which could have be used instead of sharding solution over MySQL or Tokyo Tyrant. Our system now runs OK on single Tokyo Tyrant DB but we expect a lot of traffic increase i

Re: Cassandra performance

2010-09-18 Thread Peter Schuller
> Disabling row cache in this case makes sense, but disabling key cache > is probably hurting your performance quite a bit.  If you wrote 20GB > of data per node, with narrow rows as you describe, and had default > memtable settings, you now have a huge number of sstables on disk. > You did not ind

Re: Cassandra performance

2010-09-17 Thread Benjamin Black
It appears you are doing several things that assure terrible performance, so I am not surprised you are getting it. On Tue, Sep 14, 2010 at 3:40 PM, Kamil Gorlo wrote: > My main tool was stress.py for benchmarks (or equivalent written in > C++ to deal with python2.5 lack of multiprocessing). I wi

Re: Cassandra performance

2010-09-17 Thread Peter Schuller
> durable and rich data model. It will not provide your high performance, > especially reading  performance is poor. Note that for several realistic work-loads, the above claim is most definitely wrong. For example, for large databases with a mix of insertions/deletions (so that the MySQL case doe

Re: Cassandra performance

2010-09-17 Thread Jeremy Hanna
http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures On Sep 17, 2010, at 4:35 PM, Zhong Li wrote: > This is my personal experiences. MySQL is faster than Cassandra on most > normal use cases. > > You should understand why you choose Cassandra instead of MySQL. If one >

Re: Cassandra performance

2010-09-17 Thread Zhong Li
This is my personal experiences. MySQL is faster than Cassandra on most normal use cases. You should understand why you choose Cassandra instead of MySQL. If one central MySQL can handle your workload, MySQL is better than Cassandra. BUT if you are overload one MySQL and want multiple boxes

Re: Cassandra performance

2010-09-15 Thread Wayne
If MySQL is faster then use it. I struggled to do side by side comparisons with Mysql for months until finally realizing they are too different to do side by side comparisons. Mysql is always faster out of the gate when you come at the problem thinking in terms of relational databases. Add in repli

Re: Cassandra performance

2010-09-15 Thread Peter Schuller
> But to be honest I'm pretty disappointed that Cassandra doesn't really > scale linearly (or "semi-linearly" :)) when adding new machines. I It really should scale linearly for this workload unless I have missed something important (in which case I hope someone will chime in). But note that you a

Re: Cassandra performance

2010-09-14 Thread Oleg Anastasyev
Kamil Gorlo gmail.com> writes: > > So I've got more reads from single MySQL with 400GB of data than from > 8 machines storing about 266GB. This doesn't look good. What am I > doing wrong? :) The worst case for cassandra is random reads. You should ask youself a question, do you really have this

Re: Cassandra performance

2010-09-14 Thread Kamil Gorlo
Hello, On Wed, Sep 15, 2010 at 3:53 AM, Jonathan Ellis wrote: > The key is that while Cassandra may read less rows per second than > MySQL when you are i/o bound (as you are here) because of SSTable > merging (see http://wiki.apache.org/cassandra/MemtableSSTable), you > should be using your Cassa

Re: Cassandra performance

2010-09-14 Thread Kamil Gorlo
Hello, On Wed, Sep 15, 2010 at 3:45 AM, Chen Xinli wrote: [cut] >> > Disable row cache is ok, but key cache should be enabled. It use little > memory, but reading peformance will improve a lot. Hmm, I've tested with key cache enabled (100%) and I am pretty sure that this really doesn't help si

Re: Cassandra performance

2010-09-14 Thread Jonathan Ellis
The key is that while Cassandra may read less rows per second than MySQL when you are i/o bound (as you are here) because of SSTable merging (see http://wiki.apache.org/cassandra/MemtableSSTable), you should be using your Cassandra rows as materialized views so that each query is a single row looku

Re: Cassandra performance

2010-09-14 Thread Chen Xinli
2010/9/15 Kamil Gorlo > Hey, > > we are considering using Cassandra for quite large project and because > of that I made some tests with Cassandra. I was testing performance > and stability mainly. > > My main tool was stress.py for benchmarks (or equivalent written in > C++ to deal with python2.