Re: Tag filtering data model

Artur Siekielski Sat, 19 Sep 2015 03:35:58 -0700

I came to a similar conclusion, that is if you have more than a fewtags, then the problem is no more simple "tagging" but more like regular"document search" with indexed words. There are too many word subsets toprecompute matching documents, so you need to index documentsindividually and compute intersections dynamically. And for acceptableperformance you need indexes stored fully in memory in data structuresallowing computing intersections fast. This is not something regulardatabases implement (but they can be used as backing storage for indexesloaded into memory).

So the solution is to either limit the number of tags to 3-4 and do fulldenormalization (up to 8-16 times duplication factor) or use a searchengine.


On 09/16/2015 11:29 AM, Naresh Yadav wrote:

We also had similar usecase, after lot of trials with cassandra, we
finally created solr schema doc_id(unique key), tags(indexed)
in apache solr for answering search query "Get me matching docs by any
given no of tags" and that solved our usecase. We had usecase of
millions of docs and in tags we can have 100's of tags on a doc.

Please share your final conclusion if you crack this problem within
cassandra only, would be interested to know your solution.

Re: Tag filtering data model

Reply via email to