Re: Reg:- Cassandra Data modelling for Search

@Nandan@ Mon, 12 Jun 2017 18:06:07 -0700

Ok , Then let's try to implement and will check by using cassandra-stress
to check what will be performance.
I worked on another data model for book storage for my company, with same
situations like having 1 single table with 80 columns and primary key as
bookid uuid.  Implemented Solr on top of that.  That's why , I am try to
implement all possible best solution for upcoming projects.



On Mon, Jun 12, 2017 at 7:51 PM, Eduardo Alonso <eduardoalo...@stratio.com>
wrote:

> -Virtual tokens are not recommended when using SOLR or
> cassandra-lucene-index.
>
> If you use your table schema you will not have any problem with partition
> size because your table is *not* a WIDE row table (it does not have
> clustering keys)
> The limit for 1 record with those 15 or 20 columns must not be larger that
> 100MB. You will have enough.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> <https://twitter.com/StratioBD>*
>
> 2017-06-12 12:36 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>:
>
>> And due to single table videos, maybe it will go with around 15,20
>> columns, then we need to also think very carefully about partition sizes
>> also.
>>
>> On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ <nandanpriyadarshi...@gmail.com
>> > wrote:
>>
>>> Yes this is only Option I am also thinking like this as my second
>>> options. Before this I was thinking to do denormalize table based on search
>>> columns, but due to partial search this will be not that effective.
>>>
>>> Now suppose , if we are going with this single table as videos. and
>>> implemented with Solr/Lucene, then need to also care about num_tokens ?
>>>
>>>
>>> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso <
>>> eduardoalo...@stratio.com> wrote:
>>>
>>>> Using cassandra collections
>>>>
>>>> CREATE TABLE videos (
>>>> videoid uuid primary key,
>>>> title text,
>>>> actor list<text>,
>>>> producer list<text>,
>>>> release_date timestamp,
>>>> description text,
>>>> music text,
>>>> etc...
>>>> );
>>>>
>>>> When using collection you need to take care of its length. Collections
>>>> are designed to store
>>>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html>only
>>>> a small amount of data
>>>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html>
>>>> .
>>>> 5/10 actors per movie is ok.
>>>>
>>>>
>>>> Eduardo Alonso
>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>> 28224 Pozuelo de Alarcón, Madrid
>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>>>> *@stratiobd
>>>> <https://twitter.com/StratioBD>*
>>>>
>>>> 2017-06-12 11:54 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>:
>>>>
>>>>> So In short we have to go with one single table as videos and put
>>>>> primary key as videoid uuid.
>>>>> But then how can we able to handle multiple actor name and producer
>>>>> name. ?
>>>>>
>>>>> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
>>>>> eduardoalo...@stratio.com> wrote:
>>>>>
>>>>>> Yes, you are right.
>>>>>>
>>>>>> Table denormalization is useful just when you have unique primary
>>>>>> keys, not your case.
>>>>>> Denormalized tables are only different in its primary key, every
>>>>>> denormalized table contains all the data (it just change how it is
>>>>>> structured). So, if you need to index it, do it with just one table (the
>>>>>> one you showed us with videoid as the primary key is ok).
>>>>>>
>>>>>> Solr, Elastic and cassandra-lucene-index are both based on Lucene and
>>>>>> all of them fulfill all your needs.
>>>>>>
>>>>>> Solr (in DSE) and cassandra-lucene-index
>>>>>> <https://github.com/stratio/cassandra-lucene-index> are very well
>>>>>> integrated with cassandra using its secondary index interface. If you
>>>>>> choose elastic search you will need to code the integration (write mutex,
>>>>>> both cluster synchronization (imagine something written in cassandra but
>>>>>> failed to write in elastic))
>>>>>>
>>>>>> I know i am not the most suitable to recommend you to use our product
>>>>>> cassandra-lucene-index
>>>>>> <https://github.com/stratio/cassandra-lucene-index> but it is open
>>>>>> source, just take a look.
>>>>>>
>>>>>> Eduardo Alonso
>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
>>>>>>  // *@stratiobd <https://twitter.com/StratioBD>*
>>>>>>
>>>>>> 2017-06-12 11:18 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>:
>>>>>>
>>>>>>> Hi Eduardo,
>>>>>>>
>>>>>>> And As we are trying to build an advanced search functionality in
>>>>>>> which we can able to do partial search based on actor, producer, 
>>>>>>> director,
>>>>>>> etc. columns.
>>>>>>> So if we do denormalization of tables then we have to create tables
>>>>>>> such as below :-
>>>>>>> video_by_actor
>>>>>>> video_by_producer
>>>>>>> video_by_director
>>>>>>> video_by_date
>>>>>>> etc..
>>>>>>> By using denormalized, Cassandra only allows us to do equality
>>>>>>> search, but for implementing Partial search we need to implement solr on
>>>>>>> all above tables.
>>>>>>>
>>>>>>> This is my thinking, but I think this will be not correct way to
>>>>>>> implement Apache Solr on all tables.
>>>>>>>
>>>>>>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ <
>>>>>>> nandanpriyadarshi...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Edurado,
>>>>>>>>
>>>>>>>> As you mentioned queries 1-6 ,
>>>>>>>> In this condition, we have to proceed with a table like as below :-
>>>>>>>> create table videos (
>>>>>>>> videoid uuid primary key,
>>>>>>>> title text,
>>>>>>>> actor text,
>>>>>>>> producer text,
>>>>>>>> release_date timestamp,
>>>>>>>> description text,
>>>>>>>> music text,
>>>>>>>> etc...
>>>>>>>> );
>>>>>>>> This table will help to store video datas based on PK videoid and
>>>>>>>> will give uniqeness due to uuid.
>>>>>>>> But as we know , in one movie there are multiple actor, multiple
>>>>>>>> producer, multiple music worked, So how can we store all these.. Only 
>>>>>>>> one
>>>>>>>> option will left as to use collection type columns.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
>>>>>>>> eduardoalo...@stratio.com> wrote:
>>>>>>>>
>>>>>>>>> TLDR shouldBe *PD
>>>>>>>>>
>>>>>>>>> Eduardo Alonso
>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
>>>>>>>>>  // *@stratiobd <https://twitter.com/StratioBD>*
>>>>>>>>>
>>>>>>>>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso <
>>>>>>>>> eduardoalo...@stratio.com>:
>>>>>>>>>
>>>>>>>>>> Hi Nandan:
>>>>>>>>>>
>>>>>>>>>> So, your system must provide these queries:
>>>>>>>>>>
>>>>>>>>>> 1 - SELECT video FROM ... WHERE actor = '...';
>>>>>>>>>> 2 - SELECT video FROM ... WHERE producer = '...';
>>>>>>>>>> 3 - SELECT video FROM ... WHERE music = '...';
>>>>>>>>>> 4 - SELECT video FROM ... WHERE actor = '...' AND producer
>>>>>>>>>> ='...';
>>>>>>>>>> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
>>>>>>>>>> 6 - SELECT video WHERE title CONTAINS 'Harry';
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For queries 1-5 you can get them with just cassandra,
>>>>>>>>>> denormalizing tables just the way your mentioned but without solr, 
>>>>>>>>>> just
>>>>>>>>>> cassandra (Indeed, just for equality clauses)
>>>>>>>>>>
>>>>>>>>>> video_by_actor;
>>>>>>>>>> video_by_producer;
>>>>>>>>>> video_by_music;
>>>>>>>>>> video_by_actor_and_producer;
>>>>>>>>>> video_by_actor_and_music;
>>>>>>>>>>
>>>>>>>>>> For queries number 6 you need a search engine.
>>>>>>>>>>
>>>>>>>>>> SOL
>>>>>>>>>> ElasticSearch
>>>>>>>>>> cassandra-lucene-index
>>>>>>>>>> <https://github.com/stratio/cassandra-lucene-index>
>>>>>>>>>> SASI
>>>>>>>>>> <http://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cql_commands/cqlCreateCustomIndex.html>
>>>>>>>>>>
>>>>>>>>>> I think, just for your query,  the easiest way to get it is to
>>>>>>>>>> build a SASI index.
>>>>>>>>>> TLDR: I work for stratio in cassandra-lucene-index but for your
>>>>>>>>>> basic query (only one dimension), SASI indexes will work for you.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Eduardo Alonso
>>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> //
>>>>>>>>>> www.stratio.com // *@stratiobd <https://twitter.com/StratioBD>*
>>>>>>>>>>
>>>>>>>>>> 2017-06-12 9:50 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.co
>>>>>>>>>> m>:
>>>>>>>>>>
>>>>>>>>>>> But Condition is , I am working with Apache Cassandra Database
>>>>>>>>>>> in which I have to store my data into Cassandra and then have to 
>>>>>>>>>>> implement
>>>>>>>>>>> partial search capability.
>>>>>>>>>>> If we need to search based on full search  primary key, then it
>>>>>>>>>>> really best and easy to work with Cassandra , but in case of 
>>>>>>>>>>> flexible
>>>>>>>>>>> search , I am getting confused.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin <
>>>>>>>>>>> oskar.kjel...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I haven't run solr with Cassandra myself. I just meant to run
>>>>>>>>>>>> elasticsearch as a completely separate service and write there as 
>>>>>>>>>>>> well.
>>>>>>>>>>>>
>>>>>>>>>>>> On 12 Jun 2017, at 09:45, @Nandan@ <
>>>>>>>>>>>> nandanpriyadarshi...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Do you mean to use Elastic Search with Cassandra?
>>>>>>>>>>>> Even I am thinking to use Apache Solr With Cassandra.
>>>>>>>>>>>> In that case I have to create distributed tables such as:-
>>>>>>>>>>>> 1) video_by_title, video_by_actor, video_by_year  etc..
>>>>>>>>>>>> 2) After creating Tables , will have to configure solr core on
>>>>>>>>>>>> all tables.
>>>>>>>>>>>>
>>>>>>>>>>>> Is it like this ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin <
>>>>>>>>>>>> oskar.kjel...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Why not elasticsearch for this use case? It will make your
>>>>>>>>>>>>> life much simpler
>>>>>>>>>>>>>
>>>>>>>>>>>>> > On 12 Jun 2017, at 04:40, @Nandan@ <
>>>>>>>>>>>>> nandanpriyadarshi...@gmail.com> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Currently, I am working on data modeling for Video Company
>>>>>>>>>>>>> in which we have different types of users as well as different 
>>>>>>>>>>>>> user
>>>>>>>>>>>>> functionality.
>>>>>>>>>>>>> > But currently, my concern is about Search video module based
>>>>>>>>>>>>> on different fields.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Query patterns are as below:-
>>>>>>>>>>>>> > 1) Select video by actor.
>>>>>>>>>>>>> > 2) select video by producer.
>>>>>>>>>>>>> > 3) select video by music.
>>>>>>>>>>>>> > 4) select video by actor and producer.
>>>>>>>>>>>>> > 5) select video by actor and music.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Note: - In short, We want to establish an advanced search
>>>>>>>>>>>>> module by which we can search by anyway and get the desired 
>>>>>>>>>>>>> results.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > During a search , we need partial search also such that if
>>>>>>>>>>>>> any user can search "Harry" title, then we are able to give them 
>>>>>>>>>>>>> result as
>>>>>>>>>>>>> all videos whose
>>>>>>>>>>>>> >  title contains "Harry" at any location.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > As per my ideas, I have to create separate tables such as
>>>>>>>>>>>>> video_by_actor, video_by_producer etc.. and implement solr query 
>>>>>>>>>>>>> on all
>>>>>>>>>>>>> tables. Otherwise,
>>>>>>>>>>>>> > is there any others way by which we can implement this
>>>>>>>>>>>>> search module effectively.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Please suggest.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Best regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Reg:- Cassandra Data modelling for Search

Reply via email to