So I appreciate all the help so far. Upfront, it is possible the schema and data query pattern could be contributing to the problem. The schema was born out of certain design requirements. If it proves to be part of what makes the scalability crumble, then I hope it will help shape the design requirements.
Anyway, the premise of the question was my struggle where scalability metrics fell apart going from 2 nodes to 4 nodes for the current schema and query access pattern being modeled: - 1 node was producing acceptable response times seemed to be the consensus - 2 nodes showed marked improvement to the response times for the query scenario being modeled which was welcomed news - 4 nodes showed a decrease in performance and it was not clear why going 2 to 4 nodes triggered the decrease Also what contributed to the question was 2 more items: - cassandra-env.sh - where in the example for HEAP_NEWSIZE states in the comments it assumes a modern 8 core machine for pause times - a wiki article I had found and I am trying to relocate where a person set up very small nodes for developers on that team and talked through all the paramters that had to be changed from the default to get good throughput. It sort of implied the defaults maybe were based on a certain sized vm. That was the main driver for those questions. I agree it does not seem correct to boost the values let alone so high to minimize impact in some respects (i.e. not trigger the reads to time out and start over given the retry policy). So the question really was are the defaults sized with the assumption of a certain minimal vm size (i.e. the comment in cassandra-env.sh) Does that explain where I am coming from better? My question, despite being naive and ignoring other impacts still stands, is there a minimal vm size that is more of the sweet spot for cassandra and the defaults. I get the point that a column family schema as it relates to the desired queries can and do impact that answer. I guess what bothered me was it didn't impact that answer going from 1 node to 2 nodes but started showing up going from 2 nodes to 4 nodes. I'm building whatever facts I can to support the schema and query pattern scales or does not. If it does not, then I am trying to pull information from some metrics outputted by nodetool or log statements on the cassandra log files to support a case to change the design requirements. Thanks, Diane On Mon, Jul 21, 2014 at 8:15 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Sun, Jul 20, 2014 at 6:12 PM, Diane Griffith <dfgriff...@gmail.com> > wrote: > >> I am running tests again across different number of client threads and >> number of nodes but this time I tweaked some of the timeouts configured for >> the nodes in the cluster. I was able to get better performance on the >> nodes at 10 client threads by upping 4 timeout values in cassandra.yaml to >> 240000: >> > > If you have to tune these timeout values, you have probably modeled data > in such a way that each of your requests is "quite large" or "quite slow". > > This is usually, but not always, an indicator that you are Doing It Wrong. > Massively multithreaded things don't generally like their threads to be > long-lived, for what should hopefully be obvious reasons. > > >> I did this because of my interpretation of the cfhistograms output on one >> of the nodes. >> > > Could you be more specific? > > >> So 3 questions that come to mind: >> >> >> 1. Did I interpret the histogram information correctly in cassandra >> 2.0.6 nodetool output? That the 2 column read latency output is the >> offset >> or left column is the time in milliseconds and the right column is number >> of requests that fell into that bucket range. >> 2. Was it reasonable for me to boost those 4 timeouts and just those? >> >> Not really. In 5 years of operating Cassandra, I've never had a problem > whose solution was to increase these timeouts from their default. > >> >> 1. What are reasonable timeout values for smaller vm sizes (i.e. 8GB >> RAM, 4 CPUs)? >> >> As above, I question the premise of this question. > > =Rob > >