Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-05 Thread Takenori Sato(Cloudian)
Hi, We found this issue is specific to 1.0.1 through 1.0.8, which was fixed at 1.0.9. https://issues.apache.org/jira/browse/CASSANDRA-4023 So by upgrading, we will see a reasonable performnace no matter how large row we have! Thanks, Takenori (2013/02/05 2:29), aaron morton wrote: Yes, it

Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-04 Thread aaron morton
> Yes, it contains a big row that goes up to 2GB with more than a million of > columns. I've run tests with 10 million small columns and reasonable performance. I've not looked at 1 million large columns. >> - BloomFilterSerializer#deserialize does readLong iteratively at each page >> of size

Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-03 Thread Edward Capriolo
It is interesting the press c* got about having 2 billion columns in a row. You *can* do it but it brings to light some realities of what that means. On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato wrote: > Hi Aaron, > > Thanks for your answers. That helped me get a big picture. > > Yes, it contain

Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-03 Thread Takenori Sato
Hi Aaron, Thanks for your answers. That helped me get a big picture. Yes, it contains a big row that goes up to 2GB with more than a million of columns. Let me confirm if I correctly understand. - The stack trace is from Slice By Names query. And the deserialization is at the step 3, "Read the

Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-01 Thread aaron morton
> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G) So very large rows ? What does nodetool cfstats or cfhistograms say about the row sizes ? > 1. what is happening? I think this is partially large rows and partially the query pattern, this is only by roughly correc

CPU hotspot at BloomFilterSerializer#deserialize

2013-01-30 Thread Takenori Sato
Hi all, We have a situation that CPU loads on some of our nodes in a cluster has spiked occasionally since the last November, which is triggered by requests for rows that reside on two specific sstables. We confirmed the followings(when spiked): version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8