Dear all, I've just fired up our production cluster : 12 nodes, RF=3 and I've run into something I don't understand at all. Our test cluster was 3 nodes, RF=3 Test cluster was AMD opteron CPUs (6x2.33) w/ 32GB RAM while the production cluster is core i5 (4x2.66) w/ 16 GB RAM.
I'm running the same import process using Hector as I did in August on the test cluster, but this time, I get a lot of 211725 [pool-3-thread-1] WARN me.prettyprint.cassandra.connection.HConnectionManager - Exception: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:40) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:219) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219) at com.sensorly.heatmap.rollups.cassandra.CassandraRollupWithCountersDao.executeMutator(CassandraRollupWithCountersDao.java:302) at com.sensorly.heatmap.rollups.cassandra.LoaderCallable.loadRollup(LoaderCallable.java:112) at com.sensorly.heatmap.rollups.cassandra.LoaderCallable.run(LoaderCallable.java:74) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19061) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) I've lowered the number of concurrent threads to one or running it locally on one of the nodes but it still doesn't improve. - vmstat shows nothing going on on the servers - the logs don't indicate anything - network traffic is below 1Mbit/s (I guess that's just gossip) - iostat shows no activity - nearly all of the servers' memory is free - tpstats shows that some mutations were dropped on a node. I'm stumped... what could I have missed ? Thanks PS: @aaron, Richard & co : your suggestions to my previous questions are being investigated, I'll report on my findings.