> Regarding 2), I may be running into this since data updates are very > localized by design. I've distributed the keys per storage load but I'm > going to have to distribute them by read/write load since the workload is > all but random and I'm using BOP. However, I never see an IO bottle neck > when using iostat, see below.
Ah, I keep always assuming random partitions since it is a very common case (just to be sure: unless you specifically want the ordering despite the downsides, you generally want to default to the random partitioner). > I've got two processes doing writes in parallel. The one we are currently > discussing ("Process A") only writes while the other one ("Process B") reads > 2 to 4x more data than it writes. > > Process A typically looks like this (numbers come from Hector). Each line > below is one cassandra batch ie one Hector Mutator.execute(): > 15:15:53 Wrote 86 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 > (153 usecs) > 15:15:53 Wrote 90 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (97 > usecs) > 15:15:54 Wrote 85 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 > (754 usecs) > 15:15:54 Wrote 81 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 > (561 usecs) > 15:15:54 Wrote 86 cassandra mutations using host > 176.31.226.128(176.31.226.128):9160 (130 usecs) > 15:15:54 Wrote 73 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 (97 > usecs) > 15:15:54 Wrote 82 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (48 > usecs) > 15:15:56 Wrote 108 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 > (1653 usecs) > 15:15:56 Wrote 84 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 > (23 usecs) > I'm pretty sure those are milli-seconds and not micro-seconds as per Hector > docs (see last two lines & timestamp) which would amount to 500 to 1000 > mutations per second with a min at 65 and a max at 3652. > Clusterwide, opscenter is reporting 10 writes requests per second in the > 20mn graph but that can't be right. I'm not familiar with OpsCenter, but if they seem low I suspect it's because it's counting requests to the StorageProxy. A batch of multiple reads is still a single requests to the StorageProxy, so that stat won't be a reflection of the number of columns (nor rows) affected. (Again to clarify: I do not know if opscenter is using the StorageProxy stat; that is my speculation). > When I get a log like these, there always is a "cluster-freeze" during the > preceding minute. By "cluster-freeze", I mean that a couple of nodes go to > 0% utilization (no cpu, no system, no io) An hypothesis here is that your workload is causing problem for a node (for example, sudden spikes in memory allocation causing full GC fallbacks that take time), and both the readers and the writers get "stuck" on requests to those nodes (once a sufficient number of requests happen to be destined to those). The result would be that all other nodes are no longer seeing traffic because the clients aren't making progress. > I may be overloading the cluster when Process A runs but I would like to > understand why so I can do something about it. What I'm trying to figure out > is: > - why would counter writes timeout on 28 or 38s (5 node cluster) > - what could cause the cluster to "freeze" during those timeouts > > If you have any answers or ideas on how I could find an answer, that would > be great. I would first eliminate or confirm any GC hypothesis by running all nodes with -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps. If you can see this happen sufficiently often to manually/interactively "wait for it", I suggest something as simple as fireing up an top + iostat for each host and have them on the screen at the same time, and look for what happens when you see this again. If the problem is fallback to full GC for example, the affected nodes should be churning 100% CPU (one core) for several seconds (assuming a large heap). If there is a sudden burst of disk I/O that is causing a hiccup (e.g. dirty buffer flushing by linux) this should be visibly correlated with 'iostat -x -k 1'. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)