> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G) So very large rows ? What does nodetool cfstats or cfhistograms say about the row sizes ?
> 1. what is happening? I think this is partially large rows and partially the query pattern, this is only by roughly correct http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here http://www.datastax.com/events/cassandrasummit2012/presentations > 3. any more info required to proceed? Do some tests with different query techniques… Get a single named column. Get the first 10 columns using the natural column order. Get the last 10 columns using the reversed order. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 31/01/2013, at 7:20 PM, Takenori Sato <ts...@cloudian.com> wrote: > Hi all, > > We have a situation that CPU loads on some of our nodes in a cluster has > spiked occasionally since the last November, which is triggered by requests > for rows that reside on two specific sstables. > > We confirmed the followings(when spiked): > > version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8 > jdk: Oracle 1.6.0 > > 1. a profiling showed that BloomFilterSerializer#deserialize was the > hotspot(70% of the total load by running threads) > > * the stack trace looked like this(simplified) > 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb > 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow > ... > 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData > ... > 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read > ... > 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter > 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize > 66.7% - java.io.DataInputStream.readLong > > 2. Usually, 1 should be so fast that a profiling by sampling can not detect > > 3. no pressure on Cassandra's VM heap nor on machine in overal > > 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat 1 > 1000") > > 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G) > > 6. the problematic Filter file size is only 256B(could be normal) > > > So now, I am trying to read the Filter file in the same way > BloomFilterSerializer#deserialize does as possible as I can, in order to see > if the file is something wrong. > > Could you give me some advise on: > > 1. what is happening? > 2. the best way to simulate the BloomFilterSerializer#deserialize > 3. any more info required to proceed? > > Thanks, > Takenori