Have you considered using Solandra (Solr/Lucene + Cassandra) - https://github.com/tjake/Lucandra#readme ? There is a #solandra channel on freenode if you had any questions as well.
On Mar 3, 2011, at 8:00 AM, Vodnok wrote: > Ok seems that i'll use Solr (with dedicated Cassandra) for search > > I've readed this article : > http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ > on RP vs OPP... > > > Here is my case > > > docs_shared{ //docs shared by users ordered by time > 'time:id_user:id_doc' > { > 'time':'123456' //index on it > 'id_user':'123' //index on it > 'c_type':'BOT' //index on it > 'id_doc':'123' //index on it > } > } > > So i can list all doc shared by id_user = 123 and type ='BOT' ordered by > time.... > > Well i wanted because i discovered the RP vs OPP issue. I'm default so RP and > so row id are not ordered !!! And as it's recommanded, i would like to stay RP > > So other possibility is addind a dimension with super column as column are > ordered in RP > > index{ > docs_shared{ //docs shared by users ordered by time > 'time:id_user:id_doc' > { > 'time':'123456' //index on it > 'id_user':'123' //index on it > 'c_type':'BOT' //index on it > 'id_doc':'123' > } > } > } > > BUT.... sexondary index is not possible on SC -> C > > > So next possibility is > > index{ > docs_shared_time_c_type_id_user{ //docs shared by users ordered by > time:c_type:id_user > 'time:c_type:id_user:id_doc' : 'id_doc' > } > docs_shared_c_type_time_id_user{ //docs shared by users ordered by > time:id_user:c_type > 'c_type:time:id_user:id_do' : 'id_doc' > } > ... (there is 6 combinations of time c_type id_user) > } > > Like that i can list with keystart and keyend and filters > > Example : > > No filter : index -> time:c_type:id_user > Filter on c_type : index -> c_type:time:id_user > Filter on id_user : index -> id_user:time:c_type > Filter on c_type and id_user : index -> id_user:c_type:time > > Fortunately cassandra likes writing !!! (Ironic inside) > > > So i have a question : i've readed that secondary index on SC->C will maybe > arrive in next releases... Is this information true ? And is it already > planned ? > > > Thank you, > > Sébastien, > > 2011/3/2 Burc Sade <burcs...@gmail.com> > You can use PHP Solr Extension. It is a fully featured and light-weight > client. > > http://www.php.net/manual/en/book.solr.php > > Without the secondary indexes on columns in CFs within SCFs, the best > approach is to create query-specific CFs at the moment. In the end all comes > down to how simple you can make your queries to have a minimum CF count. > > Regards, > Burc > > On Wed, Mar 2, 2011 at 9:06 AM, Vodnok <vod...@gmail.com> wrote: > I think too via Solr it'll be easier. Just need to google it. (if you have > links about Solr in php...) > > I realize that i have to remove some dimension to my CF... > > I thought it was possible to have SCF -> CF -> SC -> C:value having secondary > index on C but has i understood, secondary index on C on super is not > possible for now (but will be maybe in next version) > As i understand it's better to have more less complex CF then less more > complex CF > > Thank you for your reply, > > > > 2011/3/2 Burc Sade <burcs...@gmail.com> > > Hi Vodnok, > > For tag searches I would use a search engine like Solr (Lucene), as I think > it would be more flexible to query. You can update the index as new data > comes in and query it for queries #1, #2 and #4. > > For "All doc of type='BOT' and c_bot_code='ABC'" query, I would create the CF > below. > > doc_types > { > 'BOT:ABC': > { > <docid>: <creation_date?> > } > } > > You can assign a value you are going to need when after querying to the > docid. The problem with this schema is that if there are not many > type:c_bot_code combinations, there will be many columns under each key in > this CF. If a combination has much much more columns than others, hot spot > problem may arise. > > > > On Tue, Mar 1, 2011 at 11:39 PM, Vodnok <vod...@gmail.com> wrote: > Hi, > > Totaly newbie on Cassandra (with phpcassa) with big background on relationned > database, i'm would like to use Cassandra for a trivial case. So i'm on it > since 3 days. Sorry for my stupid question. I'm pretty sure i'm wrong but i > want to learn so i'm here > > > I would like your advise on a design for cassandra. > > > Case: > > - Users created Docs and can share docs with friends > - Users can read and share docs of their friends with other friends > - Docs can be of different type [text;picture;video;etc] > - Docs can be taggued > > > > Typical queries : > > > - Doc relative to tag > - Doc relative to mutiple tags > - Doc readed by user x > - Doc relative to tag and ratio readed_shared greater than x (see design) > - All doc of type='IMG' favorized by my friend > - All doc of type='BOT' and c_bot_code='ABC' > - All doc of type='BOT' favorized by my friend relative (tag) with 'fire' and > 'belgium' ? > > > > Design : > > > docs // all docs > { > ‘123456’: //id_docs > { > ‘t_info’: > { > 'c_type':'BOT' > 'b_del':'y' > 'b_publish':'y' > } > 't_info_type': > { > 'l_title':'Hello World!' > 'c_bot_code':'ABC' > } > 't_read_user' : //read by user x > { > //time + id_user > '123456789_123':'123' > '123456789_155':'155' > } > 't_shared_user' : //share by user x > { > //time + id_user > '123456789_123':'123' > '123456789_155':'155' > } > 't_tags' > { > 'fire':'fire' > 'belgium':'belgium' > } > 't_stats' > { > 'n_readed':'60' > 'n_shared':'6' > 'n_ratio_readed_shared':'0.1' > } > } > } > > > tags_docs // all tag linked to docs > { > 'fire'://tag > { > //creation_time + id_docs > '456789_123456': > { > 'id_doc':'123456' > 'time':'456789' > } > '456789_223456':'223456': > { > 'id_doc':'123456' > 'time':'456789' > } > '456789_323456':'223456': > { > 'id_doc':'123456' > 'time':'456789' > } > } > 'belgium': > { > ... > } > } > > > users // all users > { > ‘123’: //id_user > { > ‘t_info’: > { > l_name:'Boris' > c_lang='fr' > > } > 't_readed_docs': > { > //time + id_doc > '123456789_123456':'123456' > '123458789_136456':'136456' > } > 't_shared_docs': > { > //time + id_doc > '123456789_123456':'123456' > '123458789_136456':'136456' > } > } > } > > > users_docs // all action by users on docs > { > ‘123_123456’: // id_user + id_doc > { > 'id_doc':'123456' > 'id_user':'123' > 'd_readed':'20110301' > 'd_shared':'20110301' > } > } > > > user_friends_act // all activity of user friends > { > ‘123’:// id_user > { > 't_readed_docs': //all docs readed by my friends > { > '223456_224_123456': // time + id_friend + id_docs > { > 'id_friend':'224' > 'id_docs':'123456' > 'time':'223456' > 'c_type='BOT' > } > } > 't_shared_docs': //all docs shared by my friends > { > '223456_224_123456': // time + id_friend + id_docs > { > 'id_friend':'224' > 'id_docs':'123456' > 'time':'223456' > 'c_type='BOT' > } > } > } > } > > > > I know that certain queries are not possible for now like : - All doc of > type='BOT' favorized by my friend relative (tag) with 'fire' and 'belgium' ? > > > > What do you think ? > > > Thank you, > > > Vodnok, > > > (Please remember i'm on cassandra since 3 days) > > > >