please inputs on last email if any..
On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav <nyadav....@gmail.com> wrote: > yes thunder you are right, i had simplified that by moving *tags > *search(partial/exact) > in separate column family tagcombination which will act as index for all > search based on tags and in my my original metricresult table will store > tagcombinationid and time in columns otherwise it was getting complicated & > was not getting good results. > > Yes i agree with you on duplicating the storage with tagcombination > columnfamily...if i have billion of real tagcombinations with 8 tags in > each then i am duplicating 2^8 combinations for each one to support partial > match for that tagcombination which will make this very heavy table...with > individual keys i will not able to support search with set of tags > ......please suggest alternative solution.. > > Also one of my colleague suggested a total different approach to it but i > am not able to map that on cassandra. > Acc to him we store all possible tags in columns and for each combination > we just mark 0s, 1s whichever tags > appear in that combination...So data(TC1 as India, Pencil AND TC2 as > India, Pen) will be like : > > India Pencil Pen > TC1 1 1 0 > TC2 1 0 1 > > I am not able to design optimal column family for this in cassandra..if i > design as is then for search of India, Pen then i will select India, Pen > columns but that will touch each and every row because i am not able to > apply criteria of matching 1s only...i believe there can be better design > of this to make use of this good thought. > > Please help me on this.. > > Thanks > Naresh > > > > On Mon, Jan 27, 2014 at 11:30 PM, Thunder Stumpges < > thunder.stump...@gmail.com> wrote: > >> Hey Naresh, >> >> You asked a similar question a week or two ago. It looks like you have >> simplified your needs quite a bit. Were you able to adjust your >> requirements or separate the issue? You had a complicated time dimension >> before, as well as a single "query" for multiple AND cases on tags. >> >> .... >>> c)Give data for Metric=Sales AND Tag=U.S.A >>> O/P : 5 rows >>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen >>> O/P :1 row" >> >> >> >> I agree with Jonathan on the model for this simplified use case. However >> looking at how you are storing each partial tag combination as well as >> individual tags in the partitioning key, you will be severely duplicating >> your storage. You might want to just store individual keys in the >> partitioning key. >> >> Good luck, >> Thunder >> >> >> >> >> On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <nyadav....@gmail.com>wrote: >> >>> Thanks Jonathan for guiding me..i just want to confirm my understanding : >>> >>> create columnfamily tagcombinations { >>> partialtags text, >>> tagcombinationid text, >>> tagcombinationtags set<tags> >>> Primary Key((partialtags), tagcombinationid) >>> } >>> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as >>> India, Pen then data will stored as : >>> >>> TC1 TC2 >>> India India,Pencil India,pen >>> >>> TC1 >>> Pencil India,Pencil >>> >>> TC2 >>> Pen India,Pen >>> >>> TC1 >>> India,Pencil India,Pencil >>> >>> TC2 >>> India,Pen India, Pen >>> >>> >>> I hope i had understood the thought properly please confirm on design. >>> >>> Thanks >>> Naresh >>> >>> >>> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield < >>> jlacefi...@datastax.com> wrote: >>> >>>> Hello, >>>> >>>> The trick with this data model is to get to partition based, and/or >>>> cluster based access pattern so C* returns results quickly. In C* you want >>>> to model your tables based on your query access patterns and remember that >>>> writes are cheap and fast in C*. >>>> >>>> So, try something like the following: >>>> >>>> 1 Table with a Partition Key = Tag String >>>> Tag String = "Tag" or "set of Tags" >>>> Cluster based on tag combination (probably desc order) >>>> This will allow you to select any combination that includes >>>> Tag or "set of Tags" >>>> This will duplicate data as you will store 1 tag combination >>>> in every Tag partition, i.e. if a tag combination has 2 parts, then you >>>> will have 2 rows >>>> >>>> Hope this helps. >>>> >>>> Jonathan Lacefield >>>> Solutions Architect, DataStax >>>> (404) 822 3487 >>>> <http://www.linkedin.com/in/jlacefield> >>>> >>>> >>>> >>>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training> >>>> >>>> >>>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <nyadav....@gmail.com>wrote: >>>> >>>>> Hi all, >>>>> >>>>> Urgently need help on modelling this usecase on Cassandra. >>>>> >>>>> I have concept of tags and tagcombinations. >>>>> For example U.S.A and Pen are two tags AND if they come together in >>>>> some definition then register a tagcombination(U.S.A-Pen) for that.. >>>>> >>>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo) >>>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen, >>>>> India-Pen-Shampoo) >>>>> >>>>> - millions of tags >>>>> - billions of tagcombinations >>>>> - one tagcombination generally have 2-8 tags.... >>>>> - Every day we get lakhs of new tagcombinations to write >>>>> >>>>> Query need to support : >>>>> one tag or set of tags appears in how many tagcombinationids ???? >>>>> If i query for Pen,India then it should return two tagcombinaions >>>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in >>>>> realtime. >>>>> >>>>> I am new to cassandra and need to deliver fast so please give your >>>>> inputs. >>>>> >>>>> Thanks >>>>> Naresh >>>>> >>>>> >>>> >>> >>> >> >> > >