Hi Apoorva,

I assume this is the table with studentId and subjectId  as primary keys
and not other like like marks in that.

create table marks_table(studentId int, subjectId int, marks int, PRIMARY
KEY(studentId,subjectId));

Also could you give the cfhistogram stats?

nodetool cfhistograms <your keyspace> marks_table;



Thanks,
Shrikar


On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav
<apoorva.gau...@myntra.com>wrote:

> Hello All,
>
> We've a schema which can be modeled as (studentID, subjectID, marks) where
> combination of studentID and subjectID is unique. Number of studentID can
> go up to 100 million and for each studentID we can have up to  10k
> subjectIDs.
>
> We are using apahce cassandra 2.0.4 and datastax java driver 1.0.4. We are
> using a four node cluster, each having 24 cores and 32GB memory. I'm sure
> that the machines are not underperformant as on same test bed we've
> consistently received <5ms response times for ~1b documents when queried
> via primary key.
>
> I've tried three approaches, all of which result in significant
> deterioration (>500 ms response time) in read query performance once number
> of subjectIDs goes past ~100 for a studentID. Approaches are :-
>
> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map<int, int>)
> and query by subjectID
>
> 2. model as (studentID int, subjectID int, marks int, PRIMARY
> KEY(studentID, subjectID) and query as select * from marks_table where
> studentID = ?
>
> 3. model as (studentID int, subjectID int, marks int, PRIMARY
> KEY(studentID, subjectID) and query as select * from marks_table where
> studentID = ? and subjectID in (?, ?, ?....?)  number of subjectIDs in
> query being ~1K.
>
> What can be the bottlenecks. Is it better if we model as (studentID int,
> subjct_marks_json text) and query by studentID.
>
> --
> Thanks & Regards,
> Apoorva
>

Reply via email to