Re: Help me on Cassandra Data Modelling

Naresh Yadav Tue, 28 Jan 2014 09:09:24 -0800

please inputs on last email if any..


On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav <nyadav....@gmail.com> wrote:

> yes thunder you are right, i had simplified that by moving *tags 
> *search(partial/exact)
> in separate column family tagcombination which will act as index for all
> search based on tags and in my my original metricresult table will store
> tagcombinationid and time in columns otherwise it was getting complicated &
> was not getting good results.
>
> Yes i agree with you on duplicating the storage with tagcombination
> columnfamily...if i have billion of real tagcombinations with 8 tags in
> each then i am duplicating 2^8 combinations for each one to support partial
> match for that tagcombination which will make this very heavy table...with
> individual keys i will not able to support search with set of tags
> ......please suggest alternative solution..
>
> Also one of my colleague suggested a total different approach to it but i
> am  not able to map that on cassandra.
> Acc to him we store all possible tags in columns and for each combination
> we just mark 0s, 1s whichever tags
> appear in that combination...So data(TC1 as India, Pencil AND TC2 as
> India, Pen) will be like :
>
>                               India        Pencil           Pen
> TC1                          1             1                  0
> TC2                          1              0                  1
>
> I am not able to design optimal column family for this in cassandra..if i
> design as is then for search of India, Pen then i will select India, Pen
> columns but that will touch each and every row because i am not able to
> apply criteria of matching 1s only...i believe there can be better design
> of this to make use of this good thought.
>
> Please help me on this..
>
> Thanks
> Naresh
>
>
>
> On Mon, Jan 27, 2014 at 11:30 PM, Thunder Stumpges <
> thunder.stump...@gmail.com> wrote:
>
>> Hey Naresh,
>>
>> You asked a similar question a week or two ago. It looks like you have
>> simplified your needs quite a bit. Were you able to adjust your
>> requirements or separate the issue? You had a complicated time dimension
>> before, as well as a single "query" for multiple AND cases on tags.
>>
>> ....
>>> c)Give data for Metric=Sales AND Tag=U.S.A
>>>        O/P : 5 rows
>>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>>>        O/P :1 row"
>>
>>
>>
>> I agree with Jonathan on the model for this simplified use case. However
>> looking at how you are storing each partial tag combination as well as
>> individual tags in the partitioning key, you will be severely duplicating
>> your storage. You might want to just store individual keys in the
>> partitioning key.
>>
>> Good luck,
>> Thunder
>>
>>
>>
>>
>> On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <nyadav....@gmail.com>wrote:
>>
>>> Thanks Jonathan for guiding me..i just want to confirm my understanding :
>>>
>>> create columnfamily tagcombinations {
>>>      partialtags text,
>>>      tagcombinationid text,
>>>      tagcombinationtags set<tags>
>>> Primary Key((partialtags), tagcombinationid)
>>> }
>>> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as
>>> India, Pen then data will stored as :
>>>
>>>                    TC1              TC2
>>> India          India,Pencil   India,pen
>>>
>>>                    TC1
>>> Pencil      India,Pencil
>>>
>>>                    TC2
>>> Pen       India,Pen
>>>
>>>                         TC1
>>> India,Pencil    India,Pencil
>>>
>>>                           TC2
>>> India,Pen        India, Pen
>>>
>>>
>>> I hope i had understood the thought properly please confirm on design.
>>>
>>> Thanks
>>> Naresh
>>>
>>>
>>> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield <
>>> jlacefi...@datastax.com> wrote:
>>>
>>>> Hello,
>>>>
>>>>   The trick with this data model is to get to partition based, and/or
>>>> cluster based access pattern so C* returns results quickly.  In C* you want
>>>> to model your tables based on your query access patterns and remember that
>>>> writes are cheap and fast in C*.
>>>>
>>>>   So, try something like the following:
>>>>
>>>>   1 Table with a Partition Key = Tag String
>>>>          Tag String = "Tag" or "set of Tags"
>>>>          Cluster based on tag combination (probably desc order)
>>>>          This will allow you to select any combination that includes
>>>> Tag or "set of Tags"
>>>>          This will duplicate data as you will store 1 tag combination
>>>> in every Tag partition, i.e. if a tag combination has 2 parts, then you
>>>> will have 2 rows
>>>>
>>>>   Hope this helps.
>>>>
>>>> Jonathan Lacefield
>>>> Solutions Architect, DataStax
>>>> (404) 822 3487
>>>>  <http://www.linkedin.com/in/jlacefield>
>>>>
>>>>
>>>>
>>>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>>>>
>>>>
>>>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <nyadav....@gmail.com>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Urgently need help on modelling this usecase on Cassandra.
>>>>>
>>>>> I have concept of tags and tagcombinations.
>>>>> For example U.S.A and Pen are two tags AND if they come together in
>>>>> some definition then register a tagcombination(U.S.A-Pen) for that..
>>>>>
>>>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
>>>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
>>>>> India-Pen-Shampoo)
>>>>>
>>>>> - millions of tags
>>>>> - billions of tagcombinations
>>>>> - one tagcombination generally have 2-8 tags....
>>>>> - Every day we get lakhs of new tagcombinations to write
>>>>>
>>>>> Query need to support :
>>>>> one tag or set of tags appears in how many tagcombinationids ????
>>>>> If i query for Pen,India then it should return two tagcombinaions
>>>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in
>>>>> realtime.
>>>>>
>>>>> I am new to cassandra and need to deliver fast so please give your
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Naresh
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Help me on Cassandra Data Modelling

Reply via email to