Optimising the data model for reads

Thomas Julian Thu, 29 Sep 2016 02:31:35 -0700

Hello,



I have created a column family for User File Management.


CREATE TABLE "UserFile" ("USERID" bigint,"FILEID" text,"FILETYPE" 
int,"FOLDER_UID" text,"FILEPATHINFO" text,"JSONCOLUMN" text,PRIMARY KEY 
("USERID","FILEID"));



Sample Entry



(4*************003, 3f9**************************6a1, null, 2 , 
[{"FOLDER_TYPE":"-1","UID":"1","FOLDER":"\"HOME\""}] 
,{"filename":"untitled","size":1,"kind":-1,"where":""})




Queries :



Select "USERID","FILEID","FILETYPE","FOLDER_UID","JSONCOLUMN" from "UserFile" 
where "USERID"=&lt;value&gt; and "FILEID" in (&lt;value&gt;,&lt;value&gt;,...)



Select "USERID","FILEID","FILEPATHINFO" from "UserFile" where 
"USERID"=&lt;value&gt; and "FILEID" in (&lt;value&gt;,&lt;value&gt;,...) 



This column family was perfectly working in our lab. I was able to fetch the 
results for the queries stated at less than 10ms. I deployed this in 
production(Cassandra 2.1.13), It was working perfectly for a month or two. But 
now at times the queries are taking 5s to 10s. On analysing further, I found 
that few users are deleting the files too frequently. This generates too many 
tombstones. I have set the gc_grace_seconds to the default 10 days and I have 
chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for 
read efficiency. 



Any help is much appreciated.



Best Regards,

Julian.

Optimising the data model for reads

Reply via email to