On 4/29/15 6:02 PM, Jeetendra Gangele wrote:
Thanks for detail explanation. My only worry is to search the all combinations 
of company names through ES looks hard.


I'm not sure what makes you think "ES looks hard". Have you tried browsing the Elasticsearch reference or the definitive guide?

[1] http://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
[2] http://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

in solr we define everything in xml files like all attributes in 
WordDocumentFilterFactory and shingles factory. how to
do this in elastic search?


See the links above, the IRC or the mailing list. I don't want to derail this thread any longer so I'll wrap up by pointing to one
of the many resources that pop up on google - a blog post on shingles and a 
post from Found.no on text analysis and shingles

https://www.elastic.co/blog/searching-with-shingles
https://www.found.no/foundation/text-analysis-part-1/#optimizing-phrase-searches-with-shingles

If you need more help, do reach out to the Elasticsearch mailing list:
https://www.elastic.co/community

Cheers,



On 29 April 2015 at 20:03, Costin Leau <costin.l...@gmail.com 
<mailto:costin.l...@gmail.com>> wrote:

    # disclaimer I'm an employee of Elastic (the company behind Elasticsearch) 
and lead of Elasticsearch Hadoop integration

    Some things to clarify on the Elasticsearch side:

    1. Elasticsearch is a distributed, real-time search and analytics engine. 
Search is just one aspect of it and it can
    work with any type of data (whether it's text, image encoding, etc...): 
Github, Wikipedia, Stackoverflow are popular
    examples of known websites that are powered by Elasticsearch. In fact you 
can find plenty of use cases and
    information about this on the website [1].

    2. Elasticsearch is stand-alone and can be run on the same or separate 
machines as other services. In fact, on the
    _same_ machine, one can run _multiple_ Elasticsearch nodes (and thus 
clusters). For best performance, having
    dedicated hardware (as Nick suggested) works best.

    3. The Elasticsearch Spark integration has been available for over a year 
through Map/Reduce and the native (Scala
    and Java) API since q3 last year. There are plenty of features available 
which are fully documented here [2]. Better
    yet, there's a talk by yours truly from Spark Summit East [3] that is fully 
focused on exactly this topic.

    4. elasticsearch-hadoop is certified by Databricks, Cloudera, Hortonworks 
and MapR and supports both Spark core and
    Spark SQL 1.0-1.3. There are binaries for Scala 2.10 and 2.11. And for what 
it's worth, it provided on of the first
    (if not the first) implementation of DataSource API outside Databricks, 
which means not only using Elasticsearch in
    declarative fasion but also having push-down support for operators.

    Hopefully these materials will get you started with Spark and Elasticsearch 
and also clarify some of the
    misconceptions about Elasticsearch.

    Cheers,

    [1] https://www.elastic.co/products/elasticsearch
    [2] 
http://www.elastic.co/guide/en/elasticsearch/hadoop/master/reference.html
    [3] 
http://spark-summit.org/east/2015/talk/using-spark-and-elasticsearch-for-real-time-data-analysis


    On 4/28/15 8:16 PM, Nick Pentreath wrote:

        Depends on your use case and search volume. Typically you'd have a 
dedicated ES cluster if your app is doing a
        lot of
        real time indexing and search.

        If it's only for spark integration then you could colocate ES and spark

        —
        Sent from Mailbox <https://www.dropbox.com/mailbox>


        On Tue, Apr 28, 2015 at 6:41 PM, Jeetendra Gangele <gangele...@gmail.com 
<mailto:gangele...@gmail.com>
        <mailto:gangele...@gmail.com <mailto:gangele...@gmail.com>>> wrote:

             Thanks for reply.

             Elastic search index will be within my Cluster? or I need the 
separate host the elastic search?


             On 28 April 2015 at 22:03, Nick Pentreath <nick.pentre...@gmail.com 
<mailto:nick.pentre...@gmail.com>
        <mailto:nick.pentre...@gmail.com <mailto:nick.pentre...@gmail.com>>> 
wrote:

                 I haven't used Solr for a long time, and haven't used Solr in 
Spark.

                 However, why do you say "Elasticsearch is not a good option 
..."? ES absolutely supports full-text
        search and
                 not just filtering and grouping (in fact it's original purpose 
was and still is text search, though
        filtering,
                 grouping and aggregation are heavily used).
        
http://www.elastic.co/guide/en/elasticsearch/guide/master/full-text-search.html



                 On Tue, Apr 28, 2015 at 6:27 PM, Jeetendra Gangele 
<gangele...@gmail.com <mailto:gangele...@gmail.com>
        <mailto:gangele...@gmail.com <mailto:gangele...@gmail.com>>> wrote:

                     Does anyone tried using solr inside spark?
                     below is the project describing it.
        https://github.com/LucidWorks/spark-solr.

                     I have a requirement in which I want to index 20 millions 
companies name and then search as and
        when new
                     data comes in. the output should be list of companies 
matching the query.

                     Spark has inbuilt elastic search but for this purpose 
Elastic search is not a good option since this is
                     totally text search problem?

                     Elastic search is good  for filtering and grouping.

                     Does any body used solr inside spark?

                     Regards
                     jeetendra





    --
    Costin


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
<mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org 
<mailto:user-h...@spark.apache.org>






--
Costin


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to