CPU consumption may be affected from the cassandra-stress tool in 2nd example as well. Running on a separate system eliminates it as a possible cause. There is a little extra work but not anything that I think would be that obvious. tracing (can enable with nodetool) or profiling (ie with yourkit) can give more exposure to the bottleneck. Id run test from separate system first.
--- Chris Lohfink On Sep 23, 2014, at 12:48 PM, Leleu Eric <eric.le...@worldline.com> wrote: > First of all, Thanks for your help ! :) > > Here is some details : > >> With RF=N=2 your essentially testing a single machine locally which isnt the >> best indicator long term > I will test with more nodes, (4 with RF = 2) but for now I'm limited to 2 > nodes for non technical reason ... > >> Well, first off you shouldn't run stress tool on the node your testing. >> Give it its own box. > I performed the test in a new Keyspace in order to have a clear dataset. > >> the 2nd query since its returning 10x the data and there will be more to go >> through within the partition > I configured cassandra-stress in a way of each user has only one bucket so > the amount of data is the same in the both case. ("select * from buckets > where name = ? and tenantid = ? limit 1" and "select * from owner_to_buckets > where owner = ? and tenantid = ? limit 10"). > Does cassandra perform extra read when the limit is bigger than the available > data (even if the partition key contains only one single value in the > clustering column) ? > If the amount of data is the same, how can we explain the difference of CPU > consumption? > > > Regards, > Eric > > ________________________________________ > De : Chris Lohfink [clohf...@blackbirdit.com] > Date d'envoi : mardi 23 septembre 2014 19:23 > À : user@cassandra.apache.org > Objet : Re: CPU consumption of Cassandra > > Well, first off you shouldn't run stress tool on the node your testing. Give > it its own box. > > With RF=N=2 your essentially testing a single machine locally which isnt the > best indicator long term (optimizations available when reading data thats > local to the node). 80k/sec on a system is pretty good though, your probably > seeing slower on the 2nd query since its returning 10x the data and there > will be more to go through within the partition. 42k/sec is still acceptable > imho since these are smaller boxes. You are probably seeing high CPU because > the system is doing a lot :) > > If you want to get more out of these systems can do some tuning probably, > enable trace to see whats actually the bottleneck. > > Collections will very likely hurt more then help. > > --- > Chris Lohfink > > On Sep 23, 2014, at 9:39 AM, Leleu Eric > <eric.le...@worldline.com<mailto:eric.le...@worldline.com>> wrote: > > I tried to run “cassandra-stress” on some of my table as proposed by Jake > Luciani. > > For a simple table, this tool is able to perform 80000 read op/s with a few > CPU consumption if I request the table by the PK(name, tenanted) > > Ex : > TABLE : > > CREATE TABLE IF NOT EXISTS buckets (tenantid varchar, > name varchar, > owner varchar, > location varchar, > description varchar, > codeQuota varchar, > creationDate timestamp, > updateDate timestamp, > PRIMARY KEY (name, tenantid)); > > QUERY : select * from buckets where name = ? and tenantid = ? limit 1; > > TOP output for 900 threads on cassandra-stress : > top - 13:17:09 up 173 days, 21:54, 4 users, load average: 11.88, 4.30, 2.76 > Tasks: 272 total, 1 running, 270 sleeping, 0 stopped, 1 zombie > Cpu(s): 71.4%us, 14.0%sy, 0.0%ni, 13.1%id, 0.0%wa, 0.0%hi, 1.5%si, 0.0%st > Mem: 98894704k total, 96367436k used, 2527268k free, 15440k buffers > Swap: 0k total, 0k used, 0k free, 88194556k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 25857 root 20 0 29.7g 1.5g 12m S 693.0 1.6 38:45.58 java <== > Cassandra-stress > 29160 cassandr 20 0 16.3g 4.8g 10m S 1.3 5.0 44:46.89 java <== > Cassandra > > > > Now, If I run another query on a table that provides a list of buckets > according to the owner, the number of op/s is divided by 2 (42000 op/s) and > CPU consumption grow UP. > > Ex : > TABLE : > > CREATE TABLE IF NOT EXISTS owner_to_buckets (tenantid varchar, > name varchar, > owner varchar, > location varchar, > description varchar, > codeQuota varchar, > creationDate timestamp, > updateDate timestamp, > PRIMARY KEY ((owner, tenantid), name)); > > QUERY : select * from owner_to_buckets where owner = ? and tenantid = ? > limit 10; > > TOP output for 4 threads on cassandra-stress: > > top - 13:49:16 up 173 days, 22:26, 4 users, load average: 1.76, 1.48, 1.17 > Tasks: 273 total, 1 running, 271 sleeping, 0 stopped, 1 zombie > Cpu(s): 26.3%us, 8.0%sy, 0.0%ni, 64.7%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st > Mem: 98894704k total, 97512156k used, 1382548k free, 14580k buffers > Swap: 0k total, 0k used, 0k free, 90413772k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 29160 cassandr 20 0 13.6g 4.8g 37m S 186.7 5.1 62:26.77 java <== > Cassandra > 50622 root 20 0 28.8g 469m 12m S 102.5 0.5 0:45.84 java <== > Cassandra-stress > > TOP output for 271 threads on cassandra-stress: > > > top - 13:57:03 up 173 days, 22:34, 4 users, load average: 4.67, 1.76, 1.25 > Tasks: 272 total, 1 running, 270 sleeping, 0 stopped, 1 zombie > Cpu(s): 81.5%us, 14.0%sy, 0.0%ni, 3.1%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st > Mem: 98894704k total, 94955936k used, 3938768k free, 15892k buffers > Swap: 0k total, 0k used, 0k free, 85993676k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 29160 cassandr 20 0 13.6g 4.8g 38m S 430.0 5.1 82:31.80 java <== > Cassandra > 50622 root 20 0 29.1g 2.3g 12m S 343.4 2.4 17:51.22 java <== > Cassandra-stress > > > I have 4 tables with a composed PRIMARY KEY (two of them has 4 entries : 2 > for the partition key, one for cluster column and one for sort column) > Two of these tables are frequently read with the partition key because we > want to list data of a given user, this should explain my CPU load according > to the simple test done with Cassandra-stress … > > How can I avoid this? > Collections could be an option but the number of data per user is not limited > and can easily exceed 200 entries. According to the Cassandra documentation, > collections have a size limited to 64KB. So it is probably not a solution in > my case. ☹ > > > Regards, > Eric > > De : Chris Lohfink [mailto:clohf...@blackbirdit.com] > Envoyé : lundi 22 septembre 2014 22:03 > À : user@cassandra.apache.org<mailto:user@cassandra.apache.org> > Objet : Re: CPU consumption of Cassandra > > Its going to depend a lot on your data model but 5-6k is on the low end of > what I would expect. N=RF=2 is not really something I would recommend. That > said 93GB is not much data so the bottleneck may exist more in your data > model, queries, or client. > > What profiler are you using? The cpu on the select/read is marked as > RUNNABLE but its really more of a wait state that may throw some profilers > off, it may be a red haring. > > --- > Chris Lohfink > > On Sep 22, 2014, at 11:39 AM, Leleu Eric > <eric.le...@worldline.com<mailto:eric.le...@worldline.com>> wrote: > > > Hi, > > > I’m currently testing Cassandra 2.0.9 (and since the last week 2.1) under > some read heavy load… > > I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and > 8 Cores. > I have around 93GB of data per node (one Disk of 300GB with SAS interface and > a Rotational Speed of 10500) > > I have 300 active client threads and they request the C* nodes with a > Consitency level set to ONE (I’m using the CQL datastax driver). > > During my tests I saw a lot of CPU consumption (70% user / 6%sys / 4% iowait > / 20%idle). > C* nodes respond to around 5000 op/s (sometime up to 6000op/s) > > I try to profile a node and at the first look, 60% of the CPU is passed in > the “sun.nio.ch<http://sun.nio.ch/>” package. (SelectorImpl.select or > Channel.read) > > I know that Benchmark results are highly dependent of the Dataset and use > cases, but according to my point of view this CPU consumption is normal > according to the load. > Someone can confirm that point ? > According to my Hardware configuration, can I expect to have more than 6000 > read op/s ? > > > Regards, > Eric > > > > > > ________________________________ > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > exclusif de ses destinataires. Il peut également être protégé par le secret > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > être recherchée quant au contenu de ce message. Bien que les meilleurs > efforts soient faits pour maintenir cette transmission exempte de tout virus, > l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne > saurait être recherchée pour tout dommage résultant d'un virus transmis. > > This e-mail and the documents attached are confidential and intended solely > for the addressee; it may also be privileged. If you receive this e-mail in > error, please notify the sender immediately and destroy it. As its integrity > cannot be secured on the Internet, the Worldline liability cannot be > triggered for the message content. Although the sender endeavours to maintain > a computer virus-free network, the sender does not warrant that this > transmission is virus-free and will not be liable for any damages resulting > from any virus transmitted. > > > ________________________________ > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > exclusif de ses destinataires. Il peut également être protégé par le secret > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > être recherchée quant au contenu de ce message. Bien que les meilleurs > efforts soient faits pour maintenir cette transmission exempte de tout virus, > l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne > saurait être recherchée pour tout dommage résultant d'un virus transmis. > > This e-mail and the documents attached are confidential and intended solely > for the addressee; it may also be privileged. If you receive this e-mail in > error, please notify the sender immediately and destroy it. As its integrity > cannot be secured on the Internet, the Worldline liability cannot be > triggered for the message content. Although the sender endeavours to maintain > a computer virus-free network, the sender does not warrant that this > transmission is virus-free and will not be liable for any damages resulting > from any virus transmitted. > > > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > exclusif de ses destinataires. Il peut également être protégé par le secret > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > être recherchée quant au contenu de ce message. Bien que les meilleurs > efforts soient faits pour maintenir cette transmission exempte de tout virus, > l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne > saurait être recherchée pour tout dommage résultant d'un virus transmis. > > This e-mail and the documents attached are confidential and intended solely > for the addressee; it may also be privileged. If you receive this e-mail in > error, please notify the sender immediately and destroy it. As its integrity > cannot be secured on the Internet, the Worldline liability cannot be > triggered for the message content. Although the sender endeavours to maintain > a computer virus-free network, the sender does not warrant that this > transmission is virus-free and will not be liable for any damages resulting > from any virus transmitted.