Hey Naresh, Unfortunately I don't have any further advice. I keep feeling like you're looking at a search problem instead of a lookup problem. Perhaps Cassandra is not the right tool for your need in this case. Perhaps something with a full-text index type feature would help.
Or perhaps someone more experienced than I could come up with another design. Good luck, Thunder On Tue, Jan 28, 2014 at 9:07 AM, Naresh Yadav <nyadav....@gmail.com> wrote: > please inputs on last email if any.. > > > > On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav <nyadav....@gmail.com>wrote: > >> yes thunder you are right, i had simplified that by moving *tags >> *search(partial/exact) >> in separate column family tagcombination which will act as index for all >> search based on tags and in my my original metricresult table will store >> tagcombinationid and time in columns otherwise it was getting complicated & >> was not getting good results. >> >> Yes i agree with you on duplicating the storage with tagcombination >> columnfamily...if i have billion of real tagcombinations with 8 tags in >> each then i am duplicating 2^8 combinations for each one to support partial >> match for that tagcombination which will make this very heavy table...with >> individual keys i will not able to support search with set of tags >> ......please suggest alternative solution.. >> >> Also one of my colleague suggested a total different approach to it but i >> am not able to map that on cassandra. >> Acc to him we store all possible tags in columns and for each combination >> we just mark 0s, 1s whichever tags >> appear in that combination...So data(TC1 as India, Pencil AND TC2 as >> India, Pen) will be like : >> >> India Pencil Pen >> TC1 1 1 0 >> TC2 1 0 1 >> >> I am not able to design optimal column family for this in cassandra..if i >> design as is then for search of India, Pen then i will select India, Pen >> columns but that will touch each and every row because i am not able to >> apply criteria of matching 1s only...i believe there can be better design >> of this to make use of this good thought. >> >> Please help me on this.. >> >> Thanks >> Naresh >> >> >> >> On Mon, Jan 27, 2014 at 11:30 PM, Thunder Stumpges < >> thunder.stump...@gmail.com> wrote: >> >>> Hey Naresh, >>> >>> You asked a similar question a week or two ago. It looks like you have >>> simplified your needs quite a bit. Were you able to adjust your >>> requirements or separate the issue? You had a complicated time dimension >>> before, as well as a single "query" for multiple AND cases on tags. >>> >>> .... >>>> c)Give data for Metric=Sales AND Tag=U.S.A >>>> O/P : 5 rows >>>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen >>>> O/P :1 row" >>> >>> >>> >>> I agree with Jonathan on the model for this simplified use case. However >>> looking at how you are storing each partial tag combination as well as >>> individual tags in the partitioning key, you will be severely duplicating >>> your storage. You might want to just store individual keys in the >>> partitioning key. >>> >>> Good luck, >>> Thunder >>> >>> >>> >>> >>> On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <nyadav....@gmail.com>wrote: >>> >>>> Thanks Jonathan for guiding me..i just want to confirm my understanding >>>> : >>>> >>>> create columnfamily tagcombinations { >>>> partialtags text, >>>> tagcombinationid text, >>>> tagcombinationtags set<tags> >>>> Primary Key((partialtags), tagcombinationid) >>>> } >>>> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as >>>> India, Pen then data will stored as : >>>> >>>> TC1 TC2 >>>> India India,Pencil India,pen >>>> >>>> TC1 >>>> Pencil India,Pencil >>>> >>>> TC2 >>>> Pen India,Pen >>>> >>>> TC1 >>>> India,Pencil India,Pencil >>>> >>>> TC2 >>>> India,Pen India, Pen >>>> >>>> >>>> I hope i had understood the thought properly please confirm on design. >>>> >>>> Thanks >>>> Naresh >>>> >>>> >>>> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield < >>>> jlacefi...@datastax.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> The trick with this data model is to get to partition based, and/or >>>>> cluster based access pattern so C* returns results quickly. In C* you >>>>> want >>>>> to model your tables based on your query access patterns and remember that >>>>> writes are cheap and fast in C*. >>>>> >>>>> So, try something like the following: >>>>> >>>>> 1 Table with a Partition Key = Tag String >>>>> Tag String = "Tag" or "set of Tags" >>>>> Cluster based on tag combination (probably desc order) >>>>> This will allow you to select any combination that includes >>>>> Tag or "set of Tags" >>>>> This will duplicate data as you will store 1 tag combination >>>>> in every Tag partition, i.e. if a tag combination has 2 parts, then you >>>>> will have 2 rows >>>>> >>>>> Hope this helps. >>>>> >>>>> Jonathan Lacefield >>>>> Solutions Architect, DataStax >>>>> (404) 822 3487 >>>>> <http://www.linkedin.com/in/jlacefield> >>>>> >>>>> >>>>> >>>>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training> >>>>> >>>>> >>>>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <nyadav....@gmail.com>wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Urgently need help on modelling this usecase on Cassandra. >>>>>> >>>>>> I have concept of tags and tagcombinations. >>>>>> For example U.S.A and Pen are two tags AND if they come together in >>>>>> some definition then register a tagcombination(U.S.A-Pen) for that.. >>>>>> >>>>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo) >>>>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen, >>>>>> India-Pen-Shampoo) >>>>>> >>>>>> - millions of tags >>>>>> - billions of tagcombinations >>>>>> - one tagcombination generally have 2-8 tags.... >>>>>> - Every day we get lakhs of new tagcombinations to write >>>>>> >>>>>> Query need to support : >>>>>> one tag or set of tags appears in how many tagcombinationids ???? >>>>>> If i query for Pen,India then it should return two tagcombinaions >>>>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in >>>>>> realtime. >>>>>> >>>>>> I am new to cassandra and need to deliver fast so please give your >>>>>> inputs. >>>>>> >>>>>> Thanks >>>>>> Naresh >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >> >> > >