It looks like you're trying to use batches as a performance optimization. Don't do that, it makes your load bursty.
On Sat, Oct 8, 2011 at 7:13 PM, Philippe <watche...@gmail.com> wrote: > Dear all, > I've just fired up our production cluster : 12 nodes, RF=3 and I've run into > something I don't understand at all. Our test cluster was 3 nodes, RF=3 > Test cluster was AMD opteron CPUs (6x2.33) w/ 32GB RAM while the production > cluster is core i5 (4x2.66) w/ 16 GB RAM. > > I'm running the same import process using Hector as I did in August on the > test cluster, but this time, I get a lot of > 211725 [pool-3-thread-1] WARN > me.prettyprint.cassandra.connection.HConnectionManager - Exception: > me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() > at > me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:40) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90) > at > me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) > at > me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:219) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108) > at > me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222) > at > me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219) > at > me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) > at > me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) > at > me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219) > at > com.sensorly.heatmap.rollups.cassandra.CassandraRollupWithCountersDao.executeMutator(CassandraRollupWithCountersDao.java:302) > at > com.sensorly.heatmap.rollups.cassandra.LoaderCallable.loadRollup(LoaderCallable.java:112) > at > com.sensorly.heatmap.rollups.cassandra.LoaderCallable.run(LoaderCallable.java:74) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: TimedOutException() > at > org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19061) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035) > at > org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) > > I've lowered the number of concurrent threads to one or running it locally > on one of the nodes but it still doesn't improve. > > vmstat shows nothing going on on the servers > the logs don't indicate anything > network traffic is below 1Mbit/s (I guess that's just gossip) > iostat shows no activity > nearly all of the servers' memory is free > tpstats shows that some mutations were dropped on a node. > > I'm stumped... what could I have missed ? > > Thanks > PS: @aaron, Richard & co : your suggestions to my previous questions are > being investigated, I'll report on my findings. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com