Hello,


I have created a column family for User File Management.


CREATE TABLE "UserFile" ("USERID" bigint,"FILEID" text,"FILETYPE" 
int,"FOLDER_UID" text,"FILEPATHINFO" text,"JSONCOLUMN" text,PRIMARY KEY 
("USERID","FILEID"));



Sample Entry



(4*************003, 3f9**************************6a1, null, 2 , 
[{"FOLDER_TYPE":"-1","UID":"1","FOLDER":"\"HOME\""}] 
,{"filename":"untitled","size":1,"kind":-1,"where":""})




Queries :



Select "USERID","FILEID","FILETYPE","FOLDER_UID","JSONCOLUMN" from "UserFile" 
where "USERID"=<value> and "FILEID" in (<value>,<value>,...)



Select "USERID","FILEID","FILEPATHINFO" from "UserFile" where 
"USERID"=<value> and "FILEID" in (<value>,<value>,...) 



This column family was perfectly working in our lab. I was able to fetch the 
results for the queries stated at less than 10ms. I deployed this in 
production(Cassandra 2.1.13), It was working perfectly for a month or two. But 
now at times the queries are taking 5s to 10s. On analysing further, I found 
that few users are deleting the files too frequently. This generates too many 
tombstones. I have set the gc_grace_seconds to the default 10 days and I have 
chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for 
read efficiency. 



Any help is much appreciated.



Best Regards,

Julian.






Reply via email to