> Yes, it contains a big row that goes up to 2GB with more than a million of > columns.
I've run tests with 10 million small columns and reasonable performance. I've not looked at 1 million large columns. >> - BloomFilterSerializer#deserialize does readLong iteratively at each page >> of size 4K for a given row, which means it could be 500,000 loops(calls >> readLong) for a 2G row(from 1.0.7 source). There is only one Bloom filter per row in an SSTable, not one per column index/page. It could take a while if there are a lot of sstables in the read. nodetool cfhistorgrams will let you know, run it once to reset the counts , then do your test, then run it again. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/02/2013, at 4:13 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > It is interesting the press c* got about having 2 billion columns in a > row. You *can* do it but it brings to light some realities of what > that means. > > On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato <ts...@cloudian.com> wrote: >> Hi Aaron, >> >> Thanks for your answers. That helped me get a big picture. >> >> Yes, it contains a big row that goes up to 2GB with more than a million of >> columns. >> >> Let me confirm if I correctly understand. >> >> - The stack trace is from Slice By Names query. And the deserialization is >> at the step 3, "Read the row level Bloom Filter", on your blog. >> >> - BloomFilterSerializer#deserialize does readLong iteratively at each page >> of size 4K for a given row, which means it could be 500,000 loops(calls >> readLong) for a 2G row(from 1.0.7 source). >> >> Correct? >> >> That makes sense Slice By Names queries against such a wide row could be CPU >> bottleneck. In fact, in our test environment, a >> BloomFilterSerializer#deserialize of such a case takes more than 10ms, up to >> 100ms. >> >>> Get a single named column. >>> Get the first 10 columns using the natural column order. >>> Get the last 10 columns using the reversed order. >> >> Interesting. A query pattern could make a difference? >> >> We thought the only solutions is to change the data structure(don't use such >> a wide row if it is retrieved by Slice By Names query). >> >> Anyway, will give it a try! >> >> Best, >> Takenori >> >> On Sat, Feb 2, 2013 at 2:55 AM, aaron morton <aa...@thelastpickle.com> >> wrote: >>> >>> 5. the problematic Data file contains only 5 to 10 keys data but >>> large(2.4G) >>> >>> So very large rows ? >>> What does nodetool cfstats or cfhistograms say about the row sizes ? >>> >>> >>> 1. what is happening? >>> >>> I think this is partially large rows and partially the query pattern, this >>> is only by roughly correct >>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here >>> http://www.datastax.com/events/cassandrasummit2012/presentations >>> >>> 3. any more info required to proceed? >>> >>> Do some tests with different query techniques… >>> >>> Get a single named column. >>> Get the first 10 columns using the natural column order. >>> Get the last 10 columns using the reversed order. >>> >>> Hope that helps. >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> New Zealand >>> >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 31/01/2013, at 7:20 PM, Takenori Sato <ts...@cloudian.com> wrote: >>> >>> Hi all, >>> >>> We have a situation that CPU loads on some of our nodes in a cluster has >>> spiked occasionally since the last November, which is triggered by requests >>> for rows that reside on two specific sstables. >>> >>> We confirmed the followings(when spiked): >>> >>> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8 >>> jdk: Oracle 1.6.0 >>> >>> 1. a profiling showed that BloomFilterSerializer#deserialize was the >>> hotspot(70% of the total load by running threads) >>> >>> * the stack trace looked like this(simplified) >>> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb >>> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow >>> ... >>> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData >>> ... >>> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read >>> ... >>> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter >>> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize >>> 66.7% - java.io.DataInputStream.readLong >>> >>> 2. Usually, 1 should be so fast that a profiling by sampling can not >>> detect >>> >>> 3. no pressure on Cassandra's VM heap nor on machine in overal >>> >>> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat >>> 1 1000") >>> >>> 5. the problematic Data file contains only 5 to 10 keys data but >>> large(2.4G) >>> >>> 6. the problematic Filter file size is only 256B(could be normal) >>> >>> >>> So now, I am trying to read the Filter file in the same way >>> BloomFilterSerializer#deserialize does as possible as I can, in order to see >>> if the file is something wrong. >>> >>> Could you give me some advise on: >>> >>> 1. what is happening? >>> 2. the best way to simulate the BloomFilterSerializer#deserialize >>> 3. any more info required to proceed? >>> >>> Thanks, >>> Takenori >>> >>> >>