Apache Cassandra performance tuning - call for contribution

2022-02-09 Thread Daniel Seybold
Dear Apache Cassandra community, we plan to run a large case performance study for Apache Cassandra and MongoDB where the focus is not to compare both systems directly but to answer the question: /how much performance can you get out each DBMS with an optimal configuration compared to the vani

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
Hi Jack, > So, your 1GB input size means roughly 716 thousand rows of data and 128GB > means roughly 92 million rows, correct? Yes, that's correct. > Are your gets and searches returning single rows, or a significant number of > rows? Like I mentioned in my first email, get always returns a s

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
Thanks for that clarification. So, your 1GB input size means roughly 716 thousand rows of data and 128GB means roughly 92 million rows, correct? FWIW, a best practice recommendation is that you avoid using secondary indexes in favor of using "query tables" - store the same data in multiple tables

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Anurag Khandelwal
To clarify: Input size is the size of the dataset as a CSV file, before loading it into Cassandra; for each input size, the number of columns is fixed but the number of rows is different. By 1.5KB record, I meant that each row, when represented as a CSV entry, occupies 1500 bytes. I've used the

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jack Krupansky
What exactly is "input size" here (1GB to 128GB)? I mean, the test spec "The dataset used comprises of ~1.5KB records... there are 105 attributes in each record." Does each test run have exactly the same number of rows and columns and you're just making each column bigger, or what? Cassandra does

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Jonathan Haddad
I think you actually get a really useful metric by benchmarking 1 machine. You understand your cluster's theoretical maximum performance, which would be Nodes * number of queries. Yes, adding in replication and CL is important, but 1 machine lets you isolate certain performance metrics. On Thu, J

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Robert Wille
I disagree. I think that you can extrapolate very little information about RF>1 and CL>1 by benchmarking with RF=1 and CL=1. On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal mailto:anur...@berkeley.edu>> wrote: Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassa

Re: Cassandra Performance on a Single Machine

2016-01-13 Thread Anurag Khandelwal
Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassandra as an end-to-end distributed system, but to understand a break down of the performance. For instance, if we understand the performance characteristics that we can expect from a single machine cassandra ins

Re: Cassandra Performance on a Single Machine

2016-01-06 Thread John Schulz
Anurag, Unless you are planning on continuing to use only one machine with RF=1 benchmarking a single system using RF=Consistancy=1 is mostly a waste of time. If you are going to use RF=1 and a single host then why use Cassandra at all. Plain old relational dbs should do the job just fine. Cassan

Cassandra Performance on a Single Machine

2016-01-05 Thread Anurag Khandelwal
Hi,I’ve been benchmarking Cassandra to get an idea of how the performance scales with more data on a single machine. I just wanted to get some feedback to whether these are the numbers I should expect.The benchmarks are quite simple — I measure the latency and throughput for two kinds of queries:1.

Re: Denormalization leads to terrible, rather than better, Cassandra performance -- I am really puzzled

2015-05-04 Thread dlu66061
-leads-to-terrible-rather-than-better-Cassandra-performance-I-am-really-puzzled-tp7600561p7600618.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Denormalization leads to terrible, rather than better, Cassandra performance -- I am really puzzled

2015-05-04 Thread Steve Robenalt
lot more experienced than I am with Cassandra performance and may have additional advice. There are also quite a few good papers and videos on planet cassandra and the youtube channel regarding performance, storage, data models and the interactions between them. Hope that helps, Steve On Sun, Ma

Re: Denormalization leads to terrible, rather than better, Cassandra performance -- I am really puzzled

2015-05-03 Thread Erick Ramirez
n. > >- Why are there so many exceptions in the de-normalized case? I would >think Cassandra should be able to handle simultaneous accesses to the same >data. Why are there NO exceptions for the normalized case? I meant that the >environments for the two cases are b

Denormalization leads to terrible, rather than better, Cassandra performance -- I am really puzzled

2015-04-28 Thread dlu66061
e with Java Driver? Or did I do something wrong? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Denormalization-leads-to-terrible-rather-than-better-Cassandra-performance-I-am-really-puzzled-tp7600561.html Sent from the cassandra-u...@incubator.a

Suggestion for improving cassandra performance

2014-08-17 Thread Subodh Nijsure
I have table with 128244 entries in it. I am running one cassandra node on AWS EC2 x.large instance. Cassandra is the only daemon running on this machine. Its SSD storage. Its taking cassandra python driver running on another machine 7 seconds to retrieve that data. This is pretty small table wit

Re: Embedded Cassandra Performance

2014-04-16 Thread Sávio Teles
Thanks Chris! 2014-04-16 12:53 GMT-03:00 Chris Lohfink : > There will be a small performance improvement from not having the > app->cluster latency. If ran on same system (which I wouldn’t recommend) > the latency would be pretty irrelevant anyway unless you are fighting for > sub millisecond l

Re: Embedded Cassandra Performance

2014-04-16 Thread Chris Lohfink
There will be a small performance improvement from not having the app->cluster latency. If ran on same system (which I wouldn’t recommend) the latency would be pretty irrelevant anyway unless you are fighting for sub millisecond latency (in which case get off the JVM). You would be able to acc

Re: Embedded Cassandra Performance

2014-04-16 Thread Sávio Teles
Is it advisable to run the embedded Cassandra in production? 2014-04-16 12:08 GMT-03:00 Sávio Teles : > I'm running a cluster with Cassandra and my app embedded. > > Regarding performance, it is better to run embedded Cassandra? > > What are the implications of running an embedded Cassandra ? >

Embedded Cassandra Performance

2014-04-16 Thread Sávio Teles
I'm running a cluster with Cassandra and my app embedded. Regarding performance, it is better to run embedded Cassandra? What are the implications of running an embedded Cassandra ? Tks -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Me

RE: Cassandra Performance Testing

2014-01-24 Thread Devin Pinkston
John, Yep that makes perfect sense. Thank you for your time I appreciate it! From: John Anderstedt [mailto:john.anderst...@svenskaspel.se] Sent: Friday, January 24, 2014 9:08 AM To: user@cassandra.apache.org Subject: Re: Cassandra Performance Testing It sounds to me that the limitation in this

Re: Cassandra Performance Testing

2014-01-24 Thread John Anderstedt
It sounds to me that the limitation in this setup is the disks. if it’s in a mirror the cost for write’s is the dubble. If you have the flatfile and the db on the same disk there will be a lot of io wait. There is also a question of diskspace and fragmentation, if the flat file occupies 1,2TB o

Cassandra Performance Testing

2014-01-24 Thread Devin Pinkston
Hello, I am using a single node Cassandra setup with version 2.0.4 to do some simple performance testing. I generated a 1.2TB flat file from DBGEN (TPC-H), and I am loading that into Cassandra. I used the "COPY FROM" method from the CQLSH. My question/problem, the import has been running for

Re: various Cassandra performance problems when CQL3 is really used

2014-01-16 Thread Aaron Morton
> I don't know. How do I find out? The only mention about query plan in > Cassandra I found is your article on your site, from 2011 and considering > version 0.8. See the help for TRACE in cqlsh My general approach is to solve problems with the read path by making changes to the write path. So

Re: various Cassandra performance problems when CQL3 is really used

2014-01-15 Thread Ondřej Černoš
Hi, by the way, some of the issues are summarised here: https://issues.apache.org/jira/browse/CASSANDRA-6586 and here: https://issues.apache.org/jira/browse/CASSANDRA-6587. regards, ondrej cernos On Tue, Jan 14, 2014 at 9:48 PM, Ondřej Černoš wrote: > Hi, > > thanks for the answer and sorry

Re: various Cassandra performance problems when CQL3 is really used

2014-01-14 Thread Ondřej Černoš
Hi, thanks for the answer and sorry for the delay. Let me answer inline. On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton wrote: > > * select id from table where token(id) > token(some_value) and > secondary_index = other_val limit 2 allow filtering; > > > > Filtering absolutely kills the performa

Re: cassandra performance problems

2013-12-19 Thread Alexander Shutyaev
Thanks all for your responses. We've downgraded from 2.0.3 to 2.0.0 and everything became normal. 2013/12/8 Nate McCall > If you are really set on using Cassandra as a cache, I would recommend > disabling durable writes for the keyspace(s)[0]. This will bypass the > commitlog (the flushing/rota

Re: various Cassandra performance problems when CQL3 is really used

2013-12-17 Thread Aaron Morton
> * select id from table where token(id) > token(some_value) and > secondary_index = other_val limit 2 allow filtering; > > Filtering absolutely kills the performance. On a table populated with 130.000 > records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) > and secondary

various Cassandra performance problems when CQL3 is really used

2013-12-17 Thread Ondřej Černoš
Hi all, we are reimplementing a legacy interface of an inventory-like service (currently built on top of mysql) on Cassandra and I thought I would share some findings with the list. The interface semantics is given and cannot be changed. We chose Cassandra due to its multiple datacenter capabiliti

Re: cassandra performance problems

2013-12-07 Thread Nate McCall
If you are really set on using Cassandra as a cache, I would recommend disabling durable writes for the keyspace(s)[0]. This will bypass the commitlog (the flushing/rotation of which my be a good-sized portion of your performance problems given the number of tables). [0] http://www.datastax.com/do

Re: cassandra performance problems

2013-12-06 Thread J. Ryan Earl
On Thu, Dec 5, 2013 at 6:33 AM, Alexander Shutyaev wrote: > We've plugged it into our production environment as a cache in front of > postgres. Everything worked fine, we even stressed it by explicitly > propagating about 30G (10G/node) data from postgres to cassandra. > If you just want a cachin

Re: cassandra performance problems

2013-12-05 Thread Alexander Shutyaev
Thanks for your answers, Jonathan, yes it was load avg and iowait was lower than 2% all that time - the only load was the user one. Robert, we had -Xmx4012m which was automatically calculated by the default cassandra-env.sh (1/4 of total memory - 16G) - we didn't change that. 2013/12/5 Robert C

Re: cassandra performance problems

2013-12-05 Thread Robert Coli
On Thu, Dec 5, 2013 at 4:33 AM, Alexander Shutyaev wrote: > Cassandra version is 2.0.3. ... We've plugged it into our production > environment as a cache in front of postgres. > https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ > What can be the reason? Can it be high n

Re: cassandra performance problems

2013-12-05 Thread Jonathan Haddad
Do you mean high CPU usage or high load avg? (20 indicates load avg to me). High load avg means the CPU is waiting on something. Check "iostat -dmx 1 100" to check your disk stats, you'll see the columns that indicate mb/s read & write as well as % utilization. Once you understand the bottlenec

cassandra performance problems

2013-12-05 Thread Alexander Shutyaev
Hi all, We have a 3 node cluster setup, single keyspace, about 500 tables. The hardware is 2 cores + 16 GB RAM (Cassandra chose to have 4GB). Cassandra version is 2.0.3. Our replication factor is 3, read/write consistency is QUORUM. We've plugged it into our production environment as a cache in fr

Re: Cassandra performance tuning...

2013-07-11 Thread Eric Stevens
You should be able to set the key_validation_class on the column family to use a different data type for the row keys. You may not be able to change this for a CF with existing data without some troubles due to a mismatch of data types; if that's a concern you'll have to create a separate CF and m

Cassandra performance tuning...

2013-07-10 Thread Tony Anecito
Hi All, I am trying to compare Cassandra to another relational database. I am getting around 2-3msec response time using Datastax driver, Java 1.7.0_05 64-bit jre and the other database is under 500 microseconds for the jdbc SQL preparedStatement execute.. One of the major differences is Cassan

Re: Cassandra performance decreases drastically with increase in data size.

2013-06-03 Thread srmore
Thanks all for the help. I ran the traffic over the weekend surprisingly, my heap was doing OK (around 5.7G of 8G) but GC activity went nuts and dropped the throughput. I will probably increase the number of nodes. The other interesting thing I noticed was that there were some objects with finaliz

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread Aiman Parvaiz
I believe you should roll out more nodes as a temporary fix to your problem, 400GB on all nodes means (as correctly mentioned in other mails of this thread) you are spending more time on GC. Check out the second comment in this link by Aaron Morton, he says the more than 300GB can be problematic

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread Bryan Talbot
One or more of these might be effective depending on your particular usage - remove data (rows especially) - add nodes - add ram (has limitations) - reduce bloom filter space used by increasing fp chance - reduce row and key cache sizes - increase index sample ratio - reduce compaction concurrency

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread srmore
You are right, it looks like I am doing a lot of GC. Is there any short-term solution for this other than bumping up the heap ? because, even if I increase the heap I will run into the same issue. Only the time before I hit OOM will be lengthened. It will be while before we go to latest and greate

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-29 Thread Jonathan Ellis
Sounds like you're spending all your time in GC, which you can verify by checking what GCInspector and StatusLogger say in the log. Fix is increase your heap size or upgrade to 1.2: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 On Wed, May 29, 2013 at 11:32 PM, srmore

Cassandra performance decreases drastically with increase in data size.

2013-05-29 Thread srmore
Hello, I am observing that my performance is drastically decreasing when my data size grows. I have a 3 node cluster with 64 GB of ram and my data size is around 400GB on all the nodes. I also see that when I re-start Cassandra the performance goes back to normal and then again starts decreasing af

Re: cassandra performance

2013-03-24 Thread aaron morton
> "select CPUTime,User,site from CF(or tablename) where user=xxx and > Jobtype=xxx" Even thought cassandra has tables and looks like a RDBMS it's not. Queries with multiple secondary index clauses will not perform as well as those with none. There is plenty of documentation here http://www.da

Re: cassandra performance

2013-03-24 Thread Derek Williams
Biggest advantage of Cassandra is it's ability to scale linearly as more nodes are added and it's ability to handle node failures. Also to get the maximum performance from Cassandra you need to be making multiple requests in parallel. On Sun, Mar 24, 2013 at 3:15 AM, 张刚 wrote: > Hello, > I am

Re: cassandra performance

2013-03-24 Thread 张刚
For example,each row represent a job record,it has fields like "user","site","CPUTime","datasize","JobType"... The fields in CF is fixed,just like a table.The query like this "select CPUTime,User,site from CF(or tablename) where user=xxx and Jobtype=xxx" Best regards 2013/3/24 cem > Hi, > > Co

Re: cassandra performance

2013-03-24 Thread cem
Hi, Could you provide some other details about your schema design and queries? It is very hard to tell anything. Regards, Cem On Sun, Mar 24, 2013 at 12:40 PM, dong.yajun wrote: > Hello, > > I'd suggest you to take look at the difference between Nosql and RDMS. > > Best, > > On Sun, Mar 24, 2

Re: cassandra performance

2013-03-24 Thread dong.yajun
Hello, I'd suggest you to take look at the difference between Nosql and RDMS. Best, On Sun, Mar 24, 2013 at 5:15 PM, 张刚 wrote: > Hello, > I am new to Cassandra.I do some test on a single machine. I install > Cassandra with a binary tarball distribution. > I create a CF to store the data that

cassandra performance

2013-03-24 Thread 张刚
Hello, I am new to Cassandra.I do some test on a single machine. I install Cassandra with a binary tarball distribution. I create a CF to store the data that get from MySQL. The CF has the same fields as the table in MySQL. So it looks like a table. I do the same select from the CF in Cassandra and

Re: Cassandra Performance Benchmarking.

2013-01-21 Thread Pradeep Kumar Mantha
Hi, Thanks for the information.. I upgraded my cassandra version to 1.2.0 and tried running the experiment again to find the statistics. My application took nearly 529 seconds for querying 76896 keys. Please find the statistic information below for 32 threads ( where each thread query 76896 key

Re: Cassandra Performance Benchmarking.

2013-01-21 Thread aaron morton
You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network o

Re: Cassandra Performance Benchmarking.

2013-01-18 Thread Tyler Hobbs
The fact that it's still exactly 521 seconds is very suspicious. I can't debug your script over the mailing list, but do some sanity checks to make sure there's not a bottleneck somewhere you don't expect. On Fri, Jan 18, 2013 at 12:44 PM, Pradeep Kumar Mantha wrote: > Hi, > > Thanks Tyler. >

Re: Cassandra Performance Benchmarking.

2013-01-18 Thread Pradeep Kumar Mantha
Hi, Thanks Tyler. Below is the *global* connection pool I am trying to use, where the server_list contains all the ips of 12 DataNodes I am using and pool_size is the number of threads and I just set to timeout to 60 to avoid connection retry errors. pool = pycassa.ConnectionPool('Blast', serve

Re: Cassandra Performance Benchmarking.

2013-01-18 Thread Tyler Hobbs
You just need to increase the ConnectionPool size to handle the number of threads you have using it concurrently. Set the pool_size kwarg to at least the number of threads you're using. On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha wrote: > Thanks Tyler. > > I just moved the pool and cf

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Thanks Tyler. I just moved the pool and cf which store the connection pool and CF information to have global scope. Increased the server_list values from 1 to 4. ( i think i can increase them max to 12 since I have 12 data nodes ) when I created 8 threads using python threading package , I see

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Tyler Hobbs
ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's best to share them across multiple threads. Of course, when you do that, make sure to make the ConnectionPool large enough to support all of the threads making queries concurrently. I'm also not sure if you're just omitting t

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Hi, Thanks. I would like to benchmark cassandra with our application so that we understand the details of how the actual benchmarking is done. Not sure, how easy it would be to integrate YCSB with our application. So, i am trying different client interfaces to cassandra. I found for 12 Data Nod

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Edward Capriolo
Wow you managed to do a load test through the cassandra-cli. There should be a merit badge for that. You should use the built in stress tool or YCSB. The CLI has to do much more string conversion then a normal client would and it is not built for performance. You will definitely get better number

Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Hi, I am trying to maximize execution of the number of read queries/second. Here is my cluster configuration. Replication - Default 12 Data Nodes. 16 Client Nodes - used for querying. Each client node executes 32 threads - each thread executes 76896 read queries using cassandra-cli tool.

Re: cassandra performance looking great...

2012-09-07 Thread Hiller, Dean
Now that would be cool. Right now though, to many other features need to be added like a GUI on top of the ad-hoc query tool is the next top priority so one can do any S-SQL statement and ad-hoc query the heck out of a noSQL store. We may even be able to optimize our queries to be even faster

Re: cassandra performance looking great...

2012-09-07 Thread Edward Capriolo
Try to get Cassandra running the TPH-C benchmarks and beat oracle :) On Fri, Sep 7, 2012 at 10:01 AM, Hiller, Dean wrote: > So we wrote 1,000,000 rows into cassandra and ran a simple S-SQL(Scalable > SQL) query of > > > PARTITIONS n(:partition) SELECT n FROM TABLE as n WHERE n.numShares >= :low

cassandra performance looking great...

2012-09-07 Thread Hiller, Dean
So we wrote 1,000,000 rows into cassandra and ran a simple S-SQL(Scalable SQL) query of PARTITIONS n(:partition) SELECT n FROM TABLE as n WHERE n.numShares >= :low and n.pricePerShare >= :price It ran in 60ms So basically playOrm is going to support millions of rows per partition. This is g

Re: What are the basic steps to improve Cassandra performance

2012-08-14 Thread aaron morton
> optimize the Cassandra for performance in general It's a lot easier to answer specific questions. Cassandra is fast, and there are way to make it faster in specific use cases. > improve the performance for "select * from X" type of queries Ah. Are you specifying a row key or are you trying to g

What are the basic steps to improve Cassandra performance

2012-08-13 Thread A Geek
hi all, I'm a bit new to Cassandra and was wondering what are the basic steps that we must follow to optimize the Cassandra for performance in general and how to improve the performance for "select * from X" type of queries. Any help would be much appreciated. Note that, we have huge data sitti

Re: Cassandra performance question

2012-01-24 Thread Jonathan Ellis
No argument there. Thanks for explaining what you were doing to encrypt client traffic! On Mon, Jan 23, 2012 at 10:11 PM, Chris Marino wrote: > Hi Jonathan, yes, when I say 'node encryption' I mean inter-Cassandra node > encryption. When I say 'client encryption' I mean encrypted traffic from th

Re: Cassandra performance question

2012-01-23 Thread Chris Marino
Hi Jonathan, yes, when I say 'node encryption' I mean inter-Cassandra node encryption. When I say 'client encryption' I mean encrypted traffic from the Cassandra nodes to the clients. For these benchmarks we used the stress test client load generator. We ran test with no encryption, then with 'nod

Re: Cassandra performance question

2012-01-23 Thread Jonathan Ellis
Can you elaborate on to what exactly you were testing on the Cassandra side? It sounds like what this post refers to as "node" encryption corresponds to enabling "internode_encryption: all", but I couldn't guess what your client encryption is since Cassandra doesn't support that out of the box yet

Re: Cassandra performance question

2011-12-31 Thread Dom Wong
sweet, that's pretty awesome :) On Fri, Dec 30, 2011 at 8:08 PM, Jeremy Hanna wrote: > This might be helpful: > http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html > > On Dec 30, 2011, at 1:59 PM, Dom Wong wrote: > > > Hi, could anyone tell me whether this is possible w

Re: Cassandra performance question

2011-12-30 Thread Chris Marino
We did some benchmarking as well. http://blog.vcider.com/2011/09/virtual-networks-can-run-cassandra-up-to-60-faster/ Although we were primarily interested in the networking issues CM On Fri, Dec 30, 2011 at 12:08 PM, Jeremy Hanna wrote: > This might be helpful: > http://techblog.netflix.c

Re: Cassandra performance question

2011-12-30 Thread Jeremy Hanna
This might be helpful: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html On Dec 30, 2011, at 1:59 PM, Dom Wong wrote: > Hi, could anyone tell me whether this is possible with Cassandra using an > appropriately sized EC2 cluster. > > 100,000 clients writing 50k each

Cassandra performance question

2011-12-30 Thread Dom Wong
Hi, could anyone tell me whether this is possible with Cassandra using an appropriately sized EC2 cluster. 100,000 clients writing 50k each to their own specific row at 5 second intervals?

Cassandra performance benchmark on a virtual network....

2011-10-14 Thread Chris Marino
Thanks CM -- Forwarded message -- From: Chris Marino Date: Mon, Sep 12, 2011 at 4:23 PM Subject: Cassandra performance on a virtual network To: user@cassandra.apache.org Hello everyone, I wanted to tell you about some performance benchmarking we have done with Cassandra running i

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
On Mon, Oct 3, 2011 at 1:19 PM, Ramesh Natarajan wrote: > Thanks for the pointers.  I checked the system and the iostat showed that we > are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and > it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to > ESX

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Chris Goffinet
Yes look at cassandra.yaml there is a section about throttling compaction. You still *want* multi-threaded compaction. Throttling will occur across all threads. The reason being is that you don't want to get stuck compacting bigger files, while the smaller ones build up waiting for bigger compactio

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
Thanks for the pointers. I checked the system and the iostat showed that we are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to ESX host via iSCSI. When I run compactionstats I see we are compact

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Chris Goffinet
Most likely what could be happening is you are running single threaded compaction. Look at the cassandra.yaml of how to enable multi-threaded compaction. As more data comes into the system, bigger files get created during compaction. You could be in a situation where you might be compacting at a hi

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
In order to understand what's going on you might want to first just do write test, look at the results and then do just the read tests and then do both read / write tests. Since you mentioned high update/deletes I should also ask your CL for writes/reads? with high updates/delete + high CL I think

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
I will start another test run to collect these stats. Our test model is in the neighborhood of 4500 inserts, 8000 updates&deletes and 1500 reads every second across 6 servers. Can you elaborate more on reducing the heap space? Do you think it is a problem with 17G RSS? thanks Ramesh On Mon, Oc

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
I am wondering if you are seeing issues because of more frequent compactions kicking in. Is this primarily write ops or reads too? During the period of test gather data like: 1. cfstats 2. tpstats 3. compactionstats 4. netstats 5. iostat You have RSS memory close to 17gb. Maybe someone can give f

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Yang
maybe try row cache ? have you enabled the mlock ? (need jna.jar , and set ulimit -l ) using iostat -x would also give you more clues as to disk performance On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote: > I am running a cassandra cluster of  6 nodes running RHEL6 virtualized by > ESX

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
We have 5 CF. Attached is the output from the describe command. We don't have row cache enabled. Thanks Ramesh Keyspace: MSA: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] Column Families: ColumnFamily: admin

Re: cassandra performance degrades after 12 hours

2011-10-03 Thread Mohit Anchlia
On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan wrote: > I am running a cassandra cluster of  6 nodes running RHEL6 virtualized by > ESXi 5.0.  Each VM is configured with 20GB of ram and 12 cores. Our test > setup performs about 3000  inserts per second.  The cassandra data partition > is on a X

cassandra performance degrades after 12 hours

2011-10-03 Thread Ramesh Natarajan
I am running a cassandra cluster of 6 nodes running RHEL6 virtualized by ESXi 5.0. Each VM is configured with 20GB of ram and 12 cores. Our test setup performs about 3000 inserts per second. The cassandra data partition is on a XFS filesystem mounted with options (noatime,nodiratime,nobarrier,l

Cassandra performance on a virtual network....

2011-09-12 Thread Chris Marino
impact, relative to native interfaces. The summary results for running a 4 node cluster are: Cassandra Performance on vCider Virtual Network Replication Factor 1 32 64 128  192      256 byte cols. v. Unencrypted:     -8.2% 0.8% -2.3%-2.3% -6.7% v. Encrypted

Re: choose which Hector's serializer for Cassandra performance?

2011-04-14 Thread aaron morton
1. Order for RP is, well, random'ish for some large value of ish see http://wiki.apache.org/cassandra/FAQ#range_rp 2. Not really. Aaron On 14 Apr 2011, at 14:45, 박용욱 wrote: > Thanks very much Aaron! > > 4. Not sure how Hector handles it, try > https://github.com/zznate/cassandra-tutorial or

Re: choose which Hector's serializer for Cassandra performance?

2011-04-13 Thread 박용욱
Thanks very much Aaron! 4. Not sure how Hector handles it, try https://github.com/zznate/cassandra-tutorial or https://github.com/zznate/hector-examples Let me ask question 4 again. :) 1. RandomPartitioner and OrderPreservingPartitioner response same r

Re: choose which Hector's serializer for Cassandra performance?

2011-04-13 Thread Aaron Morton
1. Yes 2. Client 3. NA 4. Not sure how Hector handles it, try https://github.com/zznate/cassandra-tutorial or https://github.com/zznate/hector-examples Aaron On 14/04/2011, at 1:28 AM, 박용욱 wrote: > I have questions below. > > 1. In cassandra server, row key, column name and column value are

choose which Hector's serializer for Cassandra performance?

2011-04-13 Thread 박용욱
I have questions below. 1. In cassandra server, row key, column name and column value are saved in byte[], aren't they? 2. If I call hector's mutator.addInsertion(rowKey, cfName, HColumn), does the transformation from String/Integer to byte[] occur at client? or server? 3. If 2 occurs at server,

Re: Cassandra performance

2010-09-20 Thread Edward Capriolo
On Sat, Sep 18, 2010 at 9:26 AM, Peter Schuller wrote: >>  - performance (it should be not as much less than shard of MySQL and >> scale linearly, we want to have not more that 10K inserts per second >> of writes, and probably not more than 1K/s reads which will be mostly >> random) >>  - ability

Re: Cassandra performance

2010-09-18 Thread Peter Schuller
>  - performance (it should be not as much less than shard of MySQL and > scale linearly, we want to have not more that 10K inserts per second > of writes, and probably not more than 1K/s reads which will be mostly > random) >  - ability to store big amounts of data (now it looks that we will > hav

Re: Cassandra performance

2010-09-18 Thread Kamil Gorlo
Hi, first of all I am not Cassandra hater :) I do not expect miracles also :) I'm searching if there is any scalable solution which could have be used instead of sharding solution over MySQL or Tokyo Tyrant. Our system now runs OK on single Tokyo Tyrant DB but we expect a lot of traffic increase i

Re: Cassandra performance

2010-09-18 Thread Peter Schuller
> Disabling row cache in this case makes sense, but disabling key cache > is probably hurting your performance quite a bit.  If you wrote 20GB > of data per node, with narrow rows as you describe, and had default > memtable settings, you now have a huge number of sstables on disk. > You did not ind

Re: Cassandra performance

2010-09-17 Thread Benjamin Black
It appears you are doing several things that assure terrible performance, so I am not surprised you are getting it. On Tue, Sep 14, 2010 at 3:40 PM, Kamil Gorlo wrote: > My main tool was stress.py for benchmarks (or equivalent written in > C++ to deal with python2.5 lack of multiprocessing). I wi

Re: Cassandra performance

2010-09-17 Thread Peter Schuller
> durable and rich data model. It will not provide your high performance, > especially reading  performance is poor. Note that for several realistic work-loads, the above claim is most definitely wrong. For example, for large databases with a mix of insertions/deletions (so that the MySQL case doe

Re: Cassandra performance

2010-09-17 Thread Jeremy Hanna
http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures On Sep 17, 2010, at 4:35 PM, Zhong Li wrote: > This is my personal experiences. MySQL is faster than Cassandra on most > normal use cases. > > You should understand why you choose Cassandra instead of MySQL. If one >

Re: Cassandra performance

2010-09-17 Thread Zhong Li
This is my personal experiences. MySQL is faster than Cassandra on most normal use cases. You should understand why you choose Cassandra instead of MySQL. If one central MySQL can handle your workload, MySQL is better than Cassandra. BUT if you are overload one MySQL and want multiple boxes

Re: Cassandra performance

2010-09-15 Thread Wayne
If MySQL is faster then use it. I struggled to do side by side comparisons with Mysql for months until finally realizing they are too different to do side by side comparisons. Mysql is always faster out of the gate when you come at the problem thinking in terms of relational databases. Add in repli

Re: Cassandra performance

2010-09-15 Thread Peter Schuller
> But to be honest I'm pretty disappointed that Cassandra doesn't really > scale linearly (or "semi-linearly" :)) when adding new machines. I It really should scale linearly for this workload unless I have missed something important (in which case I hope someone will chime in). But note that you a

Re: Cassandra performance

2010-09-14 Thread Oleg Anastasyev
Kamil Gorlo gmail.com> writes: > > So I've got more reads from single MySQL with 400GB of data than from > 8 machines storing about 266GB. This doesn't look good. What am I > doing wrong? :) The worst case for cassandra is random reads. You should ask youself a question, do you really have this

Re: Cassandra performance

2010-09-14 Thread Kamil Gorlo
Hello, On Wed, Sep 15, 2010 at 3:53 AM, Jonathan Ellis wrote: > The key is that while Cassandra may read less rows per second than > MySQL when you are i/o bound (as you are here) because of SSTable > merging (see http://wiki.apache.org/cassandra/MemtableSSTable), you > should be using your Cassa

Re: Cassandra performance

2010-09-14 Thread Kamil Gorlo
Hello, On Wed, Sep 15, 2010 at 3:45 AM, Chen Xinli wrote: [cut] >> > Disable row cache is ok, but key cache should be enabled. It use little > memory, but reading peformance will improve a lot. Hmm, I've tested with key cache enabled (100%) and I am pretty sure that this really doesn't help si

  1   2   >