Hi Apoorva, As per the cfhistogram there are some rows which have more than 75k columns and around 150k reads hit 2 SStables.
Are you sure that you are seeing more than 500ms latency? The cfhistogram should the worst read performance was around 51ms which looks reasonable with many reads hitting 2 sstables. Thanks, Shrikar On Wed, Apr 2, 2014 at 11:30 PM, Apoorva Gaurav <apoorva.gau...@myntra.com>wrote: > Hello Shrikar, > > We are still facing read latency issue, here is the histogram > http://pastebin.com/yEvMuHYh > > > On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav <apoorva.gau...@myntra.com > > wrote: > >> Hello Shrikar, >> >> Yes primary key is (studentID, subjectID). I had dropped the test table, >> recreating and populating it post which will share the cfhistogram. In such >> case is there any practical limit on the rows I should fetch, for e.g. >> should I do >> select * form marks_table where studentID = ? limit 500; >> instead of doing >> select * form marks_table where studentID = ?; >> >> >> On Sat, Mar 29, 2014 at 5:20 AM, Shrikar archak <shrika...@gmail.com>wrote: >> >>> Hi Apoorva, >>> >>> I assume this is the table with studentId and subjectId as primary keys >>> and not other like like marks in that. >>> >>> create table marks_table(studentId int, subjectId int, marks int, >>> PRIMARY KEY(studentId,subjectId)); >>> >>> Also could you give the cfhistogram stats? >>> >>> nodetool cfhistograms <your keyspace> marks_table; >>> >>> >>> >>> Thanks, >>> Shrikar >>> >>> >>> On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav < >>> apoorva.gau...@myntra.com> wrote: >>> >>>> Hello All, >>>> >>>> We've a schema which can be modeled as (studentID, subjectID, marks) >>>> where combination of studentID and subjectID is unique. Number of studentID >>>> can go up to 100 million and for each studentID we can have up to 10k >>>> subjectIDs. >>>> >>>> We are using apahce cassandra 2.0.4 and datastax java driver 1.0.4. We >>>> are using a four node cluster, each having 24 cores and 32GB memory. I'm >>>> sure that the machines are not underperformant as on same test bed we've >>>> consistently received <5ms response times for ~1b documents when queried >>>> via primary key. >>>> >>>> I've tried three approaches, all of which result in significant >>>> deterioration (>500 ms response time) in read query performance once number >>>> of subjectIDs goes past ~100 for a studentID. Approaches are :- >>>> >>>> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map<int, >>>> int>) and query by subjectID >>>> >>>> 2. model as (studentID int, subjectID int, marks int, PRIMARY >>>> KEY(studentID, subjectID) and query as select * from marks_table where >>>> studentID = ? >>>> >>>> 3. model as (studentID int, subjectID int, marks int, PRIMARY >>>> KEY(studentID, subjectID) and query as select * from marks_table where >>>> studentID = ? and subjectID in (?, ?, ?....?) number of subjectIDs in >>>> query being ~1K. >>>> >>>> What can be the bottlenecks. Is it better if we model as (studentID >>>> int, subjct_marks_json text) and query by studentID. >>>> >>>> -- >>>> Thanks & Regards, >>>> Apoorva >>>> >>> >>> >> >> >> -- >> Thanks & Regards, >> Apoorva >> > > > > -- > Thanks & Regards, > Apoorva >