Re: Read performance

2015-05-11 Thread Alprema
According to the trace log, only one was read, the compaction strategy is size tiered. I attached a more readable version of my trace for details. On Mon, May 11, 2015 at 11:35 AM, Anishek Agarwal wrote: > how many sst tables were there? what compaction are you using ? These > properties defi

Re: Read performance

2015-05-11 Thread Anishek Agarwal
how many sst tables were there? what compaction are you using ? These properties define how many possible disk reads cassandra has to do to get all the data you need depending on which SST Tables have data for your partition key. On Fri, May 8, 2015 at 6:25 PM, Alprema wrote: > I was planning

Re: Read performance

2015-05-08 Thread Alprema
I was planning on using a more "server-friendly" strategy anyway (by parallelizing my workload on multiple metrics) but my concern here is more about the raw numbers. According to the trace and my estimation of the data size, the read from disk was done at about 30MByte/s and the transfer between

Re: Read performance

2015-05-08 Thread Bryan Holladay
Try breaking it up into smaller chunks using multiple threads and token ranges. 86400 is pretty large. I found ~1000 results per query is good. This will spread the burden across all servers a little more evenly. On Thu, May 7, 2015 at 4:27 AM, Alprema wrote: > Hi, > > I am writing an applicatio

Re: Read performance in map data type

2014-04-04 Thread Tyler Hobbs
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html On Fri, Apr 4, 2014 at 11:34 AM, Apoorva Gaurav wrote: > > > On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs wrote: > >> >> On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav < >> apoorva.gau...@myntra.com> wrot

Re: Read performance in map data type

2014-04-04 Thread Apoorva Gaurav
On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs wrote: > > On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav > wrote: > >> If we store the same data as a json using text data type i.e (studentID >> int, subjectMarksJson text) we are getting a latency of ~10ms from the same >> client for even bigger. I

Re: Read performance in map data type

2014-04-04 Thread Tyler Hobbs
On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav wrote: > If we store the same data as a json using text data type i.e (studentID > int, subjectMarksJson text) we are getting a latency of ~10ms from the same > client for even bigger. I understand that json is not the preferred storage > for cassand

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
On Fri, Apr 4, 2014 at 3:32 AM, Robert Coli wrote: > On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav > wrote: > >> At the client side we are getting a latency of ~350ms, we are using >> datastax driver 2.0.0 and have kept the fetch size as 500. And these are >> coming while reading rows having ~

Re: Read performance in map data type

2014-04-03 Thread Robert Coli
On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav wrote: > At the client side we are getting a latency of ~350ms, we are using > datastax driver 2.0.0 and have kept the fetch size as 500. And these are > coming while reading rows having ~200 columns. > And you're sure that the 300ms between what ca

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
client side socket limit : 64K client side maximum connection per host : 8 read consistency level : Quorum On Thu, Apr 3, 2014 at 12:59 PM, Shrikar archak wrote: > How about the client side socket limits? Cassandra client side maximum > connection per host and read consistency level? > > ~Shrik

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
How about the client side socket limits? Cassandra client side maximum connection per host and read consistency level? ~Shrikar On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav wrote: > At the client side we are getting a latency of ~350ms, we are using > datastax driver 2.0.0 and have kept the

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
At the client side we are getting a latency of ~350ms, we are using datastax driver 2.0.0 and have kept the fetch size as 500. And these are coming while reading rows having ~200 columns. On Thu, Apr 3, 2014 at 12:45 PM, Shrikar archak wrote: > Hi Apoorva, > As per the cfhistogram there are som

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
Hi Apoorva, As per the cfhistogram there are some rows which have more than 75k columns and around 150k reads hit 2 SStables. Are you sure that you are seeing more than 500ms latency? The cfhistogram should the worst read performance was around 51ms which looks reasonable with many reads hitting

Re: Read performance in map data type

2014-04-02 Thread Apoorva Gaurav
Hello Shrikar, We are still facing read latency issue, here is the histogram http://pastebin.com/yEvMuHYh On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav wrote: > Hello Shrikar, > > Yes primary key is (studentID, subjectID). I had dropped the test table, > recreating and populating it post whic

Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
I've observed that reducing fetch size results in better latency (isn't that obvious :-)), tried from fetch size varying from 100 to 1, seeing a lot of errors for 1. Haven't tried modifying the number of columns. Let me start a new thread focused on fetch size. On Wed, Apr 2, 2014 at 9:5

Re: Read performance in map data type

2014-04-01 Thread Sourabh Agrawal
>From the doc : The fetch size controls how much resulting rows will be retrieved simultaneously. So, I guess it does not depend on the number of columns as such. As all the columns for a key reside on the same node, I think it wouldn't matter much whatever be the number of columns as long as we ha

Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
Thanks Sourabh, I've modelled my table as "studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID)" as primarily I'll be querying using studentID and sometime using studentID and subjectID. I've tried driver 2.0.0 and its giving good results. Also using its auto paging feature.

Re: Read performance in map data type

2014-04-01 Thread Robert Coli
On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav wrote: > Thanks Robert, Is there a workaround, as in our test setups we keep > dropping and recreating tables. > Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob

Re: Read performance in map data type

2014-03-31 Thread Apoorva Gaurav
Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. On Mon, Mar 31, 2014 at 11:51 PM, Robert Coli wrote: > On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav > wrote: > >> Yes primary key is (studentID, subjectID). I had dropped the test table, >> r

Re: Read performance in map data type

2014-03-31 Thread Robert Coli
On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav wrote: > Yes primary key is (studentID, subjectID). I had dropped the test table, > recreating and populating it post which will share the cfhistogram. In such > case is there any practical limit on the rows I should fetch, for e.g. > should I do >

Re: Read performance in map data type

2014-03-29 Thread Sourabh Agrawal
Hi, I don't think there is a problem with the driver. Regarding the schema, you may want to choose between wide rows and skinny rows. http://stackoverflow.com/questions/19039123/cassandra-wide-vs-skinny-rows-for-large-columns http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html When

Re: Read performance in map data type

2014-03-29 Thread Apoorva Gaurav
Hello Sourabh, I'd prefer to do query like select * from marks_table where studentID = ? and subjectID in (?, ?, ??) but if its costly then can happily delegate the responsibility to the application layer. Haven't tried 2.x java driver for this specific issue but tried it once earlier and fou

Re: Read performance in map data type

2014-03-29 Thread Sourabh Agrawal
Hi Apoorva, Do you always query on studentID only or do you need to query on both studentID and subjectID? Also, I think using the latest driver (2.x) can make querying large number of rows efficient. http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0 On Sat, Mar 29, 2

Re: Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello Shrikar, Yes primary key is (studentID, subjectID). I had dropped the test table, recreating and populating it post which will share the cfhistogram. In such case is there any practical limit on the rows I should fetch, for e.g. should I do select * form marks_table where studentID =

Re: Read performance in map data type

2014-03-28 Thread Shrikar archak
Hi Apoorva, I assume this is the table with studentId and subjectId as primary keys and not other like like marks in that. create table marks_table(studentId int, subjectId int, marks int, PRIMARY KEY(studentId,subjectId)); Also could you give the cfhistogram stats? nodetool cfhistograms mark

Re: read performance plumetted

2012-10-12 Thread B. Todd Burruss
did the amount of data finally exceed your per machine RAM capacity? is it the same 20% each time you read? or do your periodic reads eventually work through the entire dataset? if you are essentially table scanning your data set, and the size exceeds available RAM, then a degradation like that i

Re: read performance problem

2011-11-21 Thread Kent Tong
Tong Sent: Monday, November 21, 2011 5:22 AM Subject: Re: read performance problem There is something wrong with the system. Your benchmarks are way off. How are you benchmarking? Are you using the stress lib included? On Nov 19, 2011 8:58 PM, "Kent Tong" wrote: Hi, > > >

Re: read performance problem

2011-11-20 Thread Jahangir Mohammed
There is something wrong with the system. Your benchmarks are way off. How are you benchmarking? Are you using the stress lib included? On Nov 19, 2011 8:58 PM, "Kent Tong" wrote: > Hi, > > On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am > testing the > performance of Cassan

Re: read performance problem

2011-11-19 Thread Maxim Potekhin
Try to see if there is a lot of paging going on, and run some benchmarks on the disk itself. Are you running Windows or Linux? Do you think the disk may be fragmented? Maxim On 11/19/2011 8:58 PM, Kent Tong wrote: Hi, On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am tes

Re: Read Performance / Schema Design

2011-10-26 Thread David Jeske
On Wed, Oct 26, 2011 at 7:35 PM, Ben Gambley wrote: > Our requirement is to store per user, many unique results (which is > basically an attempt at some questions ..) so I had thought of having the > userid as the row key and the result id as columns. > > The keys for the result ids are maintaine

Re: Read Performance / Schema Design

2011-10-26 Thread Tyler Hobbs
On Wed, Oct 26, 2011 at 9:35 PM, Ben Gambley wrote: > > Hi Everyone > > I have a question with regards read performance and schema design if > someone could help please. > > > Our requirement is to store per user, many unique results (which is > basically an attempt at some questions ..) so I had

Re: Read Performance

2010-04-02 Thread James Golick
Yes. On Fri, Apr 2, 2010 at 10:35 AM, Ryan King wrote: > On Thu, Apr 1, 2010 at 8:37 PM, James Golick > wrote: > > Well, folks, I'm feeling a little stupid right now (adding to the injury > > inflicted by one Mr. Stump :-P). > > So, here's the story. The cache hit rate is up around 97% now. The

Re: Read Performance

2010-04-02 Thread Ryan King
On Thu, Apr 1, 2010 at 8:37 PM, James Golick wrote: > Well, folks, I'm feeling a little stupid right now (adding to the injury > inflicted by one Mr. Stump :-P). > So, here's the story. The cache hit rate is up around 97% now. The ruby code > is down to around 20-25ms to multiget the 20 rows. I di

Re: Read Performance

2010-04-01 Thread James Golick
Yes. J. Sent from my iPhone. On 2010-04-01, at 9:21 PM, Brandon Williams wrote: On Thu, Apr 1, 2010 at 9:37 PM, James Golick wrote: Well, folks, I'm feeling a little stupid right now (adding to the injury inflicted by one Mr. Stump :-P). So, here's the story. The cache hit rate is up a

Re: Read Performance

2010-04-01 Thread Brandon Williams
On Thu, Apr 1, 2010 at 9:37 PM, James Golick wrote: > Well, folks, I'm feeling a little stupid right now (adding to the injury > inflicted by one Mr. Stump :-P). > > So, here's the story. The cache hit rate is up around 97% now. The ruby > code is down to around 20-25ms to multiget the 20 rows. I

Re: Read Performance

2010-04-01 Thread James Golick
Well, folks, I'm feeling a little stupid right now (adding to the injury inflicted by one Mr. Stump :-P). So, here's the story. The cache hit rate is up around 97% now. The ruby code is down to around 20-25ms to multiget the 20 rows. I did some profiling, though, and realized that a lot of time wa

Re: Read Performance

2010-04-01 Thread Peter Chang
pwned. On Thu, Apr 1, 2010 at 2:09 PM, James Golick wrote: > Damnit! > > > On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck wrote: > >> Or rackspace. ;) >> >> On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: >> > Taking our flamewar offline. :-D >> > >> > On Thu, Apr 1, 2010 at 1:36 PM, Ja

Re: Read Performance

2010-04-01 Thread James Golick
Damnit! On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck wrote: > Or rackspace. ;) > > On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: > > Taking our flamewar offline. :-D > > > > On Thu, Apr 1, 2010 at 1:36 PM, James Golick > wrote: > >> I don't have the additional hardware to try to iso

Re: Read Performance

2010-04-01 Thread Jeremy Dunck
Or rackspace. ;) On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: > Taking our flamewar offline. :-D > > On Thu, Apr 1, 2010 at 1:36 PM, James Golick wrote: >> I don't have the additional hardware to try to isolate this issue atm > > You'd be able to spin up hardware to isolate that issu

Re: Read Performance

2010-04-01 Thread Joseph Stump
Taking our flamewar offline. :-D On Thu, Apr 1, 2010 at 1:36 PM, James Golick wrote: > I don't have the additional hardware to try to isolate this issue atm You'd be able to spin up hardware to isolate that issue on AWS. ;) --Joe

Re: Read Performance

2010-04-01 Thread James Golick
I don't have the additional hardware to try to isolate this issue atm, so I decided to push some code that performs 20% of reads directly from cassandra. The cache hit rate has gone up to about 88% now and it's still climbing, albeit slowly. There remains plenty of free cache space. So far, the av

Re: Read Performance

2010-04-01 Thread Cemal Dalar
Hi James, I don't know how to get the below statistics data and calculate the access times (read/write in ms) in your previous mails. Can you explain a little? Iike to work on it also. CD On Thu, Apr 1, 2010 at 4:15 AM, Jonathan Ellis wrote: > On Wed, Mar 31, 2010 at 6:21 PM, James Golick > w

Re: Read Performance

2010-03-31 Thread Jonathan Ellis
On Wed, Mar 31, 2010 at 6:21 PM, James Golick wrote: > Keyspace: ActivityFeed >         Read Count: 699443 >         Read Latency: 16.11017477192566 ms. >                 Column Family: Events >                 Read Count: 232378 >                 Read Latency: 0.396 ms. >                 Row cac

Re: Read Performance

2010-03-31 Thread James Golick
Keyspace: ActivityFeed Read Count: 699443 Read Latency: 16.11017477192566 ms. Write Count: 69264920 Write Latency: 0.020393242755495856 ms. Pending Tasks: 0 ...snip Column Family: Events SSTable count: 5 Sp

Re: Read Performance

2010-03-31 Thread Jonathan Ellis
What does the CFS mbean think read latencies are? Possibly something else is introducing latency after the read. On Wed, Mar 31, 2010 at 5:37 PM, James Golick wrote: > Standard CF. 10 columns per row. Between about 800 bytes and 2k total per > row. > On Wed, Mar 31, 2010 at 3:06 PM, Chris Goffin

Re: Read Performance

2010-03-31 Thread James Golick
Standard CF. 10 columns per row. Between about 800 bytes and 2k total per row. On Wed, Mar 31, 2010 at 3:06 PM, Chris Goffinet wrote: > How many columns in each row? > > -Chris > > On Mar 31, 2010, at 2:54 PM, James Golick wrote: > > I just tried running the same multi_get against cassandra 1000

Re: Read Performance

2010-03-31 Thread Chris Goffinet
How many columns in each row? -Chris On Mar 31, 2010, at 2:54 PM, James Golick wrote: > I just tried running the same multi_get against cassandra 1000 times, > assuming that that'd force it in to cache. > > I'm definitely seeing a 5-10ms improvement, but it's still looking like > 20-30ms on a

Re: Read Performance

2010-03-31 Thread Jonathan Ellis
Yes, I would. How many columns are you reading per row? How larger are they? Are they supercolumns? On Wed, Mar 31, 2010 at 4:54 PM, James Golick wrote: > I just tried running the same multi_get against cassandra 1000 times, > assuming that that'd force it in to cache. > I'm definitely seeing

Re: Read Performance

2010-03-31 Thread James Golick
I just tried running the same multi_get against cassandra 1000 times, assuming that that'd force it in to cache. I'm definitely seeing a 5-10ms improvement, but it's still looking like 20-30ms on average. Would you expect it to be faster than that? - James On Wed, Mar 31, 2010 at 11:44 AM, Jonat

Re: Read Performance

2010-03-31 Thread Jonathan Ellis
But then you'd still be caching the same things memcached is, so unless you have a lot more ram you'll presumably miss the same rows too. The only 2-layer approach that makes sense to me would be to have cassandra keys cache at 100% behind memcached for the actual rows, which will actually reduce

Re: Read Performance

2010-03-31 Thread David Strauss
Or, if faking memcached misses is too high a price to pay, queue some proportion of the reads to replay asynchronously against Cassandra. On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote: > Can you redirect some of the reads from memcache to cassandra? Sounds > like the cache isn't getting

Re: Read Performance

2010-03-31 Thread Ryan King
On Wed, Mar 31, 2010 at 9:04 AM, Jonathan Ellis wrote: > Can you redirect some of the reads from memcache to cassandra?  Sounds > like the cache isn't getting warmed up. Yeah, putting a cache in front of a cache can ruin the locality of the second cache. -ryan

Re: Read Performance

2010-03-31 Thread Jonathan Ellis
Can you redirect some of the reads from memcache to cassandra? Sounds like the cache isn't getting warmed up. On Wed, Mar 31, 2010 at 11:01 AM, James Golick wrote: > I'm testing on the live cluster, but most of the production reads are being > served by the cache. It's definitely the right CF. >

Re: Read Performance

2010-03-31 Thread James Golick
I'm testing on the live cluster, but most of the production reads are being served by the cache. It's definitely the right CF. On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis wrote: > On Wed, Mar 31, 2010 at 12:01 AM, James Golick > wrote: > > Okay, so now my row cache hit rate jumps between 1.

Re: Read Performance

2010-03-31 Thread Jonathan Ellis
On Wed, Mar 31, 2010 at 12:01 AM, James Golick wrote: > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and NaN. > Seems like that stat is a little broken. Sounds like you aren't getting enough requests for the getRecentHitRate to make sense. use getHits / getRequests. But if

Re: Read Performance

2010-03-30 Thread James Golick
Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and NaN. Seems like that stat is a little broken. Still seeing around 35ms to multiget 20 rows. - James On Tue, Mar 30, 2010 at 9:22 PM, Ryan King wrote: > On Tue, Mar 30, 2010 at 9:11 PM, James Golick > wrote: > > No change ob

Re: Read Performance

2010-03-30 Thread Ryan King
On Tue, Mar 30, 2010 at 9:11 PM, James Golick wrote: > No change observed. The hit rate fluctuates between 0.0, 0.3, and NaN every > time I run cfstats. > I just increased it by 10x. Hopefully that'll help. You should turn the caches up until you either run out of heap, or the hitrate stops going

Re: Read Performance

2010-03-30 Thread James Golick
No change observed. The hit rate fluctuates between 0.0, 0.3, and NaN every time I run cfstats. I just increased it by 10x. Hopefully that'll help. On Tue, Mar 30, 2010 at 8:59 PM, Jonathan Ellis wrote: > What is your row cache hit rate? > > By "still slow" do you mean "no change observed" or "

Re: Read Performance

2010-03-30 Thread Jonathan Ellis
What is your row cache hit rate? By "still slow" do you mean "no change observed" or "faster but not fast enough?" On Tue, Mar 30, 2010 at 10:47 PM, James Golick wrote: > We are starting to use cassandra to power our activity feed. The way we > organize our data is simple. "Event"s live in a CF