RE: Lucene index plugin for Apache Cassandra

Matthew Johnson Mon, 15 Jun 2015 09:09:55 -0700

Hi Andres,



This looks awesome, many thanks for your work on this. Just out of
curiosity, how does this compare to the DSE Cassandra with embedded Solr?
Do they provide very similar functionality? Is there a list of obvious pros
and cons of one versus the other?



Thanks!

Matthew





*From:* Andres de la Peña [mailto:[email protected]]
*Sent:* 13 June 2015 13:20
*To:* [email protected]
*Subject:* Re: Lucene index plugin for Apache Cassandra



Thanks for showing interest.



Faceting is not yet supported, but it is in our roadmap. Our goal is to add
to Cassandra as many Lucene features as possible.



2015-06-12 18:21 GMT+02:00 Mohammed Guller <[email protected]>:

The plugin looks cool. Thank you for open sourcing it.



Does it support faceting and other Solr functionality?



Mohammed



*From:* Andres de la Peña [mailto:[email protected]]
*Sent:* Friday, June 12, 2015 3:43 AM
*To:* [email protected]
*Subject:* Re: Lucene index plugin for Apache Cassandra



I really appreciate your interest



Well, the first recommendation is to not use it unless you need it, because
a properly Cassandra denormalized model is almost always preferable to
indexing. Lucene indexing is a good option when there is no viable
denormalization alternative. This is the case of range queries over
multiple dimensions, full-text search or maybe complex boolean predicates.
It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the
total amount of rows in a certain table, if you can pay the cost of
indexing.



Lucene indexes run inside C*, so users should closely monitor the amount of
used memory. It's also a good idea to put the Lucene directory files in a
separate disk to those used by C* itself. Additionally, you should consider
that indexed tables write throughput will be appreciably reduced, maybe to
a few thousands rows per second.



It's really hard to estimate the amount of resources needed by the index
due to the great variety of indexing and querying ways that Lucene offers,
so the only thing we can suggest is to empirically find the optimal setup
for your use case.



2015-06-12 12:00 GMT+02:00 Carlos Rolo <[email protected]>:

Seems like an interesting tool!

What operational recommendations would you make to users of this tool
(Extra hardware capacity, extra metrics to monitor, etc)?


Regards,



Carlos Juzarte Rolo

Cassandra Consultant



Pythian - Love your data



rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*

Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

www.pythian.com



On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña <[email protected]>
wrote:

Unfortunately, we don't have published any benchmarks yet, but we have
plans to do it as soon as possible. However, you can expect a similar
behavior as those of Elasticsearch or Solr, with some overhead due to the
need for indexing both the Cassandra's row key and the partition's token.
You can also take a look at this presentation
<http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/>
to see how cluster distribution is done.



2015-06-12 0:45 GMT+02:00 Ben Bromhead <[email protected]>:

Looks awesome, do you have any examples/benchmarks of using these indexes
for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+?



On 10 June 2015 at 09:08, Andres de la Peña <[email protected]> wrote:

Hi all,



With the release of Cassandra 2.1.6, Stratio is glad to present its open
source Lucene-based implementation of C* secondary indexes
<https://github.com/Stratio/cassandra-lucene-index> as a plugin that can be
attached to Apache Cassandra. Before the above changes, Lucene index was
distributed inside a fork of Apache Cassandra, with all the difficulties
implied. As of now, the fork is discontinued and new users should use the
recently created plugin, which maintains all the features of Stratio
Cassandra <https://github.com/Stratio/stratio-cassandra>.



Stratio's Lucene index extends Cassandra’s functionality to provide near
real-time distributed search engine capabilities such as with ElasticSearch
or Solr, including full text search capabilities, free multivariable
search, relevance queries and field-based sorting. Each node indexes its
own data, so high availability and scalability is guaranteed.



We hope this will be useful to the Apache Cassandra community.



Regards,



-- 


Andrés de la Peña



<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta

28224 Pozuelo de Alarcón, Madrid

Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*





-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
<http://twitter.com/instaclustr> | (650) 284 9692





-- 


Andrés de la Peña



<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta

28224 Pozuelo de Alarcón, Madrid

Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*





--







-- 


Andrés de la Peña



<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta

28224 Pozuelo de Alarcón, Madrid

Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*





-- 


Andrés de la Peña



<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta

28224 Pozuelo de Alarcón, Madrid

Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

RE: Lucene index plugin for Apache Cassandra

Reply via email to