At the client side we are getting a latency of ~350ms, we are using datastax driver 2.0.0 and have kept the fetch size as 500. And these are coming while reading rows having ~200 columns.
On Thu, Apr 3, 2014 at 12:45 PM, Shrikar archak <shrika...@gmail.com> wrote: > Hi Apoorva, > As per the cfhistogram there are some rows which have more than 75k > columns and around 150k reads hit 2 SStables. > > Are you sure that you are seeing more than 500ms latency? The cfhistogram > should the worst read performance was around 51ms > which looks reasonable with many reads hitting 2 sstables. > > Thanks, > Shrikar > > > On Wed, Apr 2, 2014 at 11:30 PM, Apoorva Gaurav <apoorva.gau...@myntra.com > > wrote: > >> Hello Shrikar, >> >> We are still facing read latency issue, here is the histogram >> http://pastebin.com/yEvMuHYh >> >> >> On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav < >> apoorva.gau...@myntra.com> wrote: >> >>> Hello Shrikar, >>> >>> Yes primary key is (studentID, subjectID). I had dropped the test table, >>> recreating and populating it post which will share the cfhistogram. In such >>> case is there any practical limit on the rows I should fetch, for e.g. >>> should I do >>> select * form marks_table where studentID = ? limit 500; >>> instead of doing >>> select * form marks_table where studentID = ?; >>> >>> >>> On Sat, Mar 29, 2014 at 5:20 AM, Shrikar archak <shrika...@gmail.com>wrote: >>> >>>> Hi Apoorva, >>>> >>>> I assume this is the table with studentId and subjectId as primary >>>> keys and not other like like marks in that. >>>> >>>> create table marks_table(studentId int, subjectId int, marks int, >>>> PRIMARY KEY(studentId,subjectId)); >>>> >>>> Also could you give the cfhistogram stats? >>>> >>>> nodetool cfhistograms <your keyspace> marks_table; >>>> >>>> >>>> >>>> Thanks, >>>> Shrikar >>>> >>>> >>>> On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav < >>>> apoorva.gau...@myntra.com> wrote: >>>> >>>>> Hello All, >>>>> >>>>> We've a schema which can be modeled as (studentID, subjectID, marks) >>>>> where combination of studentID and subjectID is unique. Number of >>>>> studentID >>>>> can go up to 100 million and for each studentID we can have up to 10k >>>>> subjectIDs. >>>>> >>>>> We are using apahce cassandra 2.0.4 and datastax java driver 1.0.4. We >>>>> are using a four node cluster, each having 24 cores and 32GB memory. I'm >>>>> sure that the machines are not underperformant as on same test bed we've >>>>> consistently received <5ms response times for ~1b documents when queried >>>>> via primary key. >>>>> >>>>> I've tried three approaches, all of which result in significant >>>>> deterioration (>500 ms response time) in read query performance once >>>>> number >>>>> of subjectIDs goes past ~100 for a studentID. Approaches are :- >>>>> >>>>> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map<int, >>>>> int>) and query by subjectID >>>>> >>>>> 2. model as (studentID int, subjectID int, marks int, PRIMARY >>>>> KEY(studentID, subjectID) and query as select * from marks_table where >>>>> studentID = ? >>>>> >>>>> 3. model as (studentID int, subjectID int, marks int, PRIMARY >>>>> KEY(studentID, subjectID) and query as select * from marks_table where >>>>> studentID = ? and subjectID in (?, ?, ?....?) number of subjectIDs in >>>>> query being ~1K. >>>>> >>>>> What can be the bottlenecks. Is it better if we model as (studentID >>>>> int, subjct_marks_json text) and query by studentID. >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> Apoorva >>>>> >>>> >>>> >>> >>> >>> -- >>> Thanks & Regards, >>> Apoorva >>> >> >> >> >> -- >> Thanks & Regards, >> Apoorva >> > > -- Thanks & Regards, Apoorva