Ok , Then let's try to implement and will check by using cassandra-stress to check what will be performance. I worked on another data model for book storage for my company, with same situations like having 1 single table with 80 columns and primary key as bookid uuid. Implemented Solr on top of that. That's why , I am try to implement all possible best solution for upcoming projects.
On Mon, Jun 12, 2017 at 7:51 PM, Eduardo Alonso <eduardoalo...@stratio.com> wrote: > -Virtual tokens are not recommended when using SOLR or > cassandra-lucene-index. > > If you use your table schema you will not have any problem with partition > size because your table is *not* a WIDE row table (it does not have > clustering keys) > The limit for 1 record with those 15 or 20 columns must not be larger that > 100MB. You will have enough. > > Eduardo Alonso > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > 28224 Pozuelo de Alarcón, Madrid > Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // > *@stratiobd > <https://twitter.com/StratioBD>* > > 2017-06-12 12:36 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>: > >> And due to single table videos, maybe it will go with around 15,20 >> columns, then we need to also think very carefully about partition sizes >> also. >> >> On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ <nandanpriyadarshi...@gmail.com >> > wrote: >> >>> Yes this is only Option I am also thinking like this as my second >>> options. Before this I was thinking to do denormalize table based on search >>> columns, but due to partial search this will be not that effective. >>> >>> Now suppose , if we are going with this single table as videos. and >>> implemented with Solr/Lucene, then need to also care about num_tokens ? >>> >>> >>> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso < >>> eduardoalo...@stratio.com> wrote: >>> >>>> Using cassandra collections >>>> >>>> CREATE TABLE videos ( >>>> videoid uuid primary key, >>>> title text, >>>> actor list<text>, >>>> producer list<text>, >>>> release_date timestamp, >>>> description text, >>>> music text, >>>> etc... >>>> ); >>>> >>>> When using collection you need to take care of its length. Collections >>>> are designed to store >>>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html>only >>>> a small amount of data >>>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html> >>>> . >>>> 5/10 actors per movie is ok. >>>> >>>> >>>> Eduardo Alonso >>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>> 28224 Pozuelo de Alarcón, Madrid >>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // >>>> *@stratiobd >>>> <https://twitter.com/StratioBD>* >>>> >>>> 2017-06-12 11:54 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>: >>>> >>>>> So In short we have to go with one single table as videos and put >>>>> primary key as videoid uuid. >>>>> But then how can we able to handle multiple actor name and producer >>>>> name. ? >>>>> >>>>> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso < >>>>> eduardoalo...@stratio.com> wrote: >>>>> >>>>>> Yes, you are right. >>>>>> >>>>>> Table denormalization is useful just when you have unique primary >>>>>> keys, not your case. >>>>>> Denormalized tables are only different in its primary key, every >>>>>> denormalized table contains all the data (it just change how it is >>>>>> structured). So, if you need to index it, do it with just one table (the >>>>>> one you showed us with videoid as the primary key is ok). >>>>>> >>>>>> Solr, Elastic and cassandra-lucene-index are both based on Lucene and >>>>>> all of them fulfill all your needs. >>>>>> >>>>>> Solr (in DSE) and cassandra-lucene-index >>>>>> <https://github.com/stratio/cassandra-lucene-index> are very well >>>>>> integrated with cassandra using its secondary index interface. If you >>>>>> choose elastic search you will need to code the integration (write mutex, >>>>>> both cluster synchronization (imagine something written in cassandra but >>>>>> failed to write in elastic)) >>>>>> >>>>>> I know i am not the most suitable to recommend you to use our product >>>>>> cassandra-lucene-index >>>>>> <https://github.com/stratio/cassandra-lucene-index> but it is open >>>>>> source, just take a look. >>>>>> >>>>>> Eduardo Alonso >>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>>> 28224 Pozuelo de Alarcón, Madrid >>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com >>>>>> // *@stratiobd <https://twitter.com/StratioBD>* >>>>>> >>>>>> 2017-06-12 11:18 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.com>: >>>>>> >>>>>>> Hi Eduardo, >>>>>>> >>>>>>> And As we are trying to build an advanced search functionality in >>>>>>> which we can able to do partial search based on actor, producer, >>>>>>> director, >>>>>>> etc. columns. >>>>>>> So if we do denormalization of tables then we have to create tables >>>>>>> such as below :- >>>>>>> video_by_actor >>>>>>> video_by_producer >>>>>>> video_by_director >>>>>>> video_by_date >>>>>>> etc.. >>>>>>> By using denormalized, Cassandra only allows us to do equality >>>>>>> search, but for implementing Partial search we need to implement solr on >>>>>>> all above tables. >>>>>>> >>>>>>> This is my thinking, but I think this will be not correct way to >>>>>>> implement Apache Solr on all tables. >>>>>>> >>>>>>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ < >>>>>>> nandanpriyadarshi...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Edurado, >>>>>>>> >>>>>>>> As you mentioned queries 1-6 , >>>>>>>> In this condition, we have to proceed with a table like as below :- >>>>>>>> create table videos ( >>>>>>>> videoid uuid primary key, >>>>>>>> title text, >>>>>>>> actor text, >>>>>>>> producer text, >>>>>>>> release_date timestamp, >>>>>>>> description text, >>>>>>>> music text, >>>>>>>> etc... >>>>>>>> ); >>>>>>>> This table will help to store video datas based on PK videoid and >>>>>>>> will give uniqeness due to uuid. >>>>>>>> But as we know , in one movie there are multiple actor, multiple >>>>>>>> producer, multiple music worked, So how can we store all these.. Only >>>>>>>> one >>>>>>>> option will left as to use collection type columns. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso < >>>>>>>> eduardoalo...@stratio.com> wrote: >>>>>>>> >>>>>>>>> TLDR shouldBe *PD >>>>>>>>> >>>>>>>>> Eduardo Alonso >>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>>>>>> 28224 Pozuelo de Alarcón, Madrid >>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com >>>>>>>>> // *@stratiobd <https://twitter.com/StratioBD>* >>>>>>>>> >>>>>>>>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso < >>>>>>>>> eduardoalo...@stratio.com>: >>>>>>>>> >>>>>>>>>> Hi Nandan: >>>>>>>>>> >>>>>>>>>> So, your system must provide these queries: >>>>>>>>>> >>>>>>>>>> 1 - SELECT video FROM ... WHERE actor = '...'; >>>>>>>>>> 2 - SELECT video FROM ... WHERE producer = '...'; >>>>>>>>>> 3 - SELECT video FROM ... WHERE music = '...'; >>>>>>>>>> 4 - SELECT video FROM ... WHERE actor = '...' AND producer >>>>>>>>>> ='...'; >>>>>>>>>> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...'; >>>>>>>>>> 6 - SELECT video WHERE title CONTAINS 'Harry'; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> For queries 1-5 you can get them with just cassandra, >>>>>>>>>> denormalizing tables just the way your mentioned but without solr, >>>>>>>>>> just >>>>>>>>>> cassandra (Indeed, just for equality clauses) >>>>>>>>>> >>>>>>>>>> video_by_actor; >>>>>>>>>> video_by_producer; >>>>>>>>>> video_by_music; >>>>>>>>>> video_by_actor_and_producer; >>>>>>>>>> video_by_actor_and_music; >>>>>>>>>> >>>>>>>>>> For queries number 6 you need a search engine. >>>>>>>>>> >>>>>>>>>> SOL >>>>>>>>>> ElasticSearch >>>>>>>>>> cassandra-lucene-index >>>>>>>>>> <https://github.com/stratio/cassandra-lucene-index> >>>>>>>>>> SASI >>>>>>>>>> <http://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cql_commands/cqlCreateCustomIndex.html> >>>>>>>>>> >>>>>>>>>> I think, just for your query, the easiest way to get it is to >>>>>>>>>> build a SASI index. >>>>>>>>>> TLDR: I work for stratio in cassandra-lucene-index but for your >>>>>>>>>> basic query (only one dimension), SASI indexes will work for you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Eduardo Alonso >>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta >>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid >>>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // >>>>>>>>>> www.stratio.com // *@stratiobd <https://twitter.com/StratioBD>* >>>>>>>>>> >>>>>>>>>> 2017-06-12 9:50 GMT+02:00 @Nandan@ <nandanpriyadarshi...@gmail.co >>>>>>>>>> m>: >>>>>>>>>> >>>>>>>>>>> But Condition is , I am working with Apache Cassandra Database >>>>>>>>>>> in which I have to store my data into Cassandra and then have to >>>>>>>>>>> implement >>>>>>>>>>> partial search capability. >>>>>>>>>>> If we need to search based on full search primary key, then it >>>>>>>>>>> really best and easy to work with Cassandra , but in case of >>>>>>>>>>> flexible >>>>>>>>>>> search , I am getting confused. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin < >>>>>>>>>>> oskar.kjel...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I haven't run solr with Cassandra myself. I just meant to run >>>>>>>>>>>> elasticsearch as a completely separate service and write there as >>>>>>>>>>>> well. >>>>>>>>>>>> >>>>>>>>>>>> On 12 Jun 2017, at 09:45, @Nandan@ < >>>>>>>>>>>> nandanpriyadarshi...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Do you mean to use Elastic Search with Cassandra? >>>>>>>>>>>> Even I am thinking to use Apache Solr With Cassandra. >>>>>>>>>>>> In that case I have to create distributed tables such as:- >>>>>>>>>>>> 1) video_by_title, video_by_actor, video_by_year etc.. >>>>>>>>>>>> 2) After creating Tables , will have to configure solr core on >>>>>>>>>>>> all tables. >>>>>>>>>>>> >>>>>>>>>>>> Is it like this ? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin < >>>>>>>>>>>> oskar.kjel...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Why not elasticsearch for this use case? It will make your >>>>>>>>>>>>> life much simpler >>>>>>>>>>>>> >>>>>>>>>>>>> > On 12 Jun 2017, at 04:40, @Nandan@ < >>>>>>>>>>>>> nandanpriyadarshi...@gmail.com> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Hi, >>>>>>>>>>>>> > >>>>>>>>>>>>> > Currently, I am working on data modeling for Video Company >>>>>>>>>>>>> in which we have different types of users as well as different >>>>>>>>>>>>> user >>>>>>>>>>>>> functionality. >>>>>>>>>>>>> > But currently, my concern is about Search video module based >>>>>>>>>>>>> on different fields. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Query patterns are as below:- >>>>>>>>>>>>> > 1) Select video by actor. >>>>>>>>>>>>> > 2) select video by producer. >>>>>>>>>>>>> > 3) select video by music. >>>>>>>>>>>>> > 4) select video by actor and producer. >>>>>>>>>>>>> > 5) select video by actor and music. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Note: - In short, We want to establish an advanced search >>>>>>>>>>>>> module by which we can search by anyway and get the desired >>>>>>>>>>>>> results. >>>>>>>>>>>>> > >>>>>>>>>>>>> > During a search , we need partial search also such that if >>>>>>>>>>>>> any user can search "Harry" title, then we are able to give them >>>>>>>>>>>>> result as >>>>>>>>>>>>> all videos whose >>>>>>>>>>>>> > title contains "Harry" at any location. >>>>>>>>>>>>> > >>>>>>>>>>>>> > As per my ideas, I have to create separate tables such as >>>>>>>>>>>>> video_by_actor, video_by_producer etc.. and implement solr query >>>>>>>>>>>>> on all >>>>>>>>>>>>> tables. Otherwise, >>>>>>>>>>>>> > is there any others way by which we can implement this >>>>>>>>>>>>> search module effectively. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Please suggest. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Best regards, >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >