Re: horizontal query scaling issues follow on

2014-07-23 Thread Benedict Elliott Smith
that is spread over multiple partitions, and so extra work needs to be done >>> cross-cluster to service your requests as more nodes are added. >>> >>> I would also consider what effect the file cache may be having on your >>> workload, as it sounds small enough to fit i

Re: horizontal query scaling issues follow on

2014-07-23 Thread Diane Griffith
ou try different >> client levels for the smaller cluster you may see improved performance as >> the data is pulled into file cache across test runs, and then when you >> build your larger cluster this is lost so performance appears to degrade >> (for instance). >> >

Re: horizontal query scaling issues follow on

2014-07-21 Thread Diane Griffith
So I appreciate all the help so far. Upfront, it is possible the schema and data query pattern could be contributing to the problem. The schema was born out of certain design requirements. If it proves to be part of what makes the scalability crumble, then I hope it will help shape the design re

Re: horizontal query scaling issues follow on

2014-07-21 Thread Robert Coli
On Sun, Jul 20, 2014 at 6:12 PM, Diane Griffith wrote: > I am running tests again across different number of client threads and > number of nodes but this time I tweaked some of the timeouts configured for > the nodes in the cluster. I was able to get better performance on the > nodes at 10 clie

Re: horizontal query scaling issues follow on

2014-07-21 Thread Jonathan Lacefield
Hello, Here is the documentation for cfhistograms, which is in microseconds. http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFhisto.html Your question about setting timeouts is subjective, but you have set your timeout limits to 4 mins, which seems excessive. The

Re: horizontal query scaling issues follow on

2014-07-20 Thread Diane Griffith
I am running tests again across different number of client threads and number of nodes but this time I tweaked some of the timeouts configured for the nodes in the cluster. I was able to get better performance on the nodes at 10 client threads by upping 4 timeout values in cassandra.yaml to 24

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
PM, Diane Griffith > wrote: > >> The column family schema is: >> >> CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text, >> PRIMARY KEY(key, col_name)) >> >> where the key is a generated uuid and all keys were inserted in random >> o

Re: horizontal query scaling issues follow on

2014-07-18 Thread Tyler Hobbs
On Fri, Jul 18, 2014 at 8:01 AM, Diane Griffith wrote: > > Partition Size (bytes) > 1109 bytes: 1800 > > Cell Count per Partition > 8 cells: 1800 > > meaning I can't glean anything about how it partitioned or if it broke a > key across partitions from this right? Does it mean for 180

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
stering >>> columns, or does each row have a unique partition key and no clustering >>> columns. >>> >>> -- Jack Krupansky >>> >>> *From:* Diane Griffith >>> *Sent:* Thursday, July 17, 2014 6:21 PM >>> *To:* user >>> *Subjec

Re: horizontal query scaling issues follow on

2014-07-18 Thread Benedict Elliott Smith
your primary key and whether you >> are using a small number of partition keys and a large number of clustering >> columns, or does each row have a unique partition key and no clustering >> columns. >> >> -- Jack Krupansky >> >> *From:* Diane Griffith >

Re: horizontal query scaling issues follow on

2014-07-18 Thread Diane Griffith
g > columns. > > -- Jack Krupansky > > *From:* Diane Griffith > *Sent:* Thursday, July 17, 2014 6:21 PM > *To:* user > *Subject:* Re: horizontal query scaling issues follow on > > So do partitions equate to tokens/vnodes? > > If so we had configured all

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jonathan Haddad
The problem with starting without vnodes is moving to them is a bit hairy. In particular, nodetool shuffle has been reported to take an extremely long time (days, weeks). I would start with vnodes if you have any intent on using them. On Thu, Jul 17, 2014 at 6:03 PM, Robert Coli wrote: > On Thu

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jack Krupansky
whether you are using a small number of partition keys and a large number of clustering columns, or does each row have a unique partition key and no clustering columns. -- Jack Krupansky From: Diane Griffith Sent: Thursday, July 17, 2014 6:21 PM To: user Subject: Re: horizontal query scaling

Re: horizontal query scaling issues follow on

2014-07-17 Thread Robert Coli
On Thu, Jul 17, 2014 at 5:16 PM, Diane Griffith wrote: > I did tests comparing 1, 2, 10, 20, 50, 100 clients spawned all querying. > Performance on 2 nodes starts to degrade from 10 clients on. I saw > similar behavior on 4 nodes but haven't done the official runs on that yet. > > Ok, if you'v

Re: horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
So I stripped out the number of clients experiment path information. It is unclear if I can only show horizontal scaling by also spawning many client requests all working at once. So that is why I stripped that information out to distill what our original attempt was at how to show horizontal sca

Re: horizontal query scaling issues follow on

2014-07-17 Thread Robert Coli
On Thu, Jul 17, 2014 at 3:21 PM, Diane Griffith wrote: > So do partitions equate to tokens/vnodes? > A partition is what used to be called a "row". Each individual token in the token ring can contain a partition, which you request using the token as the key. A "token range" is the space betwee

Re: horizontal query scaling issues follow on

2014-07-17 Thread Diane Griffith
So do partitions equate to tokens/vnodes? If so we had configured all cluster nodes/vms with num_tokens: 256 instead of setting init_token and assigning ranges. I am still not getting why in Cassandra 2.0, I would assign my own ranges via init_token and this was based on the documentation and eve

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jack Krupansky
How many partitions are you spreading those 18 million rows over? That many rows in a single partition will not be a sweet spot for Cassandra. It’s not exceeding any hard limit (2 billion), but some internal operations may cache the partition rather than the logical row. And all those rows in a