On 4/29/15 6:02 PM, Jeetendra Gangele wrote:
Thanks for detail explanation. My only worry is to search the all combinations of company names through ES looks hard.
I'm not sure what makes you think "ES looks hard". Have you tried browsing the Elasticsearch reference or the definitive guide?
[1] http://www.elastic.co/guide/en/elasticsearch/reference/current/index.html [2] http://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
in solr we define everything in xml files like all attributes in WordDocumentFilterFactory and shingles factory. how to do this in elastic search?
See the links above, the IRC or the mailing list. I don't want to derail this thread any longer so I'll wrap up by pointing to one
of the many resources that pop up on google - a blog post on shingles and a post from Found.no on text analysis and shingles https://www.elastic.co/blog/searching-with-shingles https://www.found.no/foundation/text-analysis-part-1/#optimizing-phrase-searches-with-shingles If you need more help, do reach out to the Elasticsearch mailing list: https://www.elastic.co/community Cheers,
On 29 April 2015 at 20:03, Costin Leau <costin.l...@gmail.com <mailto:costin.l...@gmail.com>> wrote: # disclaimer I'm an employee of Elastic (the company behind Elasticsearch) and lead of Elasticsearch Hadoop integration Some things to clarify on the Elasticsearch side: 1. Elasticsearch is a distributed, real-time search and analytics engine. Search is just one aspect of it and it can work with any type of data (whether it's text, image encoding, etc...): Github, Wikipedia, Stackoverflow are popular examples of known websites that are powered by Elasticsearch. In fact you can find plenty of use cases and information about this on the website [1]. 2. Elasticsearch is stand-alone and can be run on the same or separate machines as other services. In fact, on the _same_ machine, one can run _multiple_ Elasticsearch nodes (and thus clusters). For best performance, having dedicated hardware (as Nick suggested) works best. 3. The Elasticsearch Spark integration has been available for over a year through Map/Reduce and the native (Scala and Java) API since q3 last year. There are plenty of features available which are fully documented here [2]. Better yet, there's a talk by yours truly from Spark Summit East [3] that is fully focused on exactly this topic. 4. elasticsearch-hadoop is certified by Databricks, Cloudera, Hortonworks and MapR and supports both Spark core and Spark SQL 1.0-1.3. There are binaries for Scala 2.10 and 2.11. And for what it's worth, it provided on of the first (if not the first) implementation of DataSource API outside Databricks, which means not only using Elasticsearch in declarative fasion but also having push-down support for operators. Hopefully these materials will get you started with Spark and Elasticsearch and also clarify some of the misconceptions about Elasticsearch. Cheers, [1] https://www.elastic.co/products/elasticsearch [2] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/reference.html [3] http://spark-summit.org/east/2015/talk/using-spark-and-elasticsearch-for-real-time-data-analysis On 4/28/15 8:16 PM, Nick Pentreath wrote: Depends on your use case and search volume. Typically you'd have a dedicated ES cluster if your app is doing a lot of real time indexing and search. If it's only for spark integration then you could colocate ES and spark — Sent from Mailbox <https://www.dropbox.com/mailbox> On Tue, Apr 28, 2015 at 6:41 PM, Jeetendra Gangele <gangele...@gmail.com <mailto:gangele...@gmail.com> <mailto:gangele...@gmail.com <mailto:gangele...@gmail.com>>> wrote: Thanks for reply. Elastic search index will be within my Cluster? or I need the separate host the elastic search? On 28 April 2015 at 22:03, Nick Pentreath <nick.pentre...@gmail.com <mailto:nick.pentre...@gmail.com> <mailto:nick.pentre...@gmail.com <mailto:nick.pentre...@gmail.com>>> wrote: I haven't used Solr for a long time, and haven't used Solr in Spark. However, why do you say "Elasticsearch is not a good option ..."? ES absolutely supports full-text search and not just filtering and grouping (in fact it's original purpose was and still is text search, though filtering, grouping and aggregation are heavily used). http://www.elastic.co/guide/en/elasticsearch/guide/master/full-text-search.html On Tue, Apr 28, 2015 at 6:27 PM, Jeetendra Gangele <gangele...@gmail.com <mailto:gangele...@gmail.com> <mailto:gangele...@gmail.com <mailto:gangele...@gmail.com>>> wrote: Does anyone tried using solr inside spark? below is the project describing it. https://github.com/LucidWorks/spark-solr. I have a requirement in which I want to index 20 millions companies name and then search as and when new data comes in. the output should be list of companies matching the query. Spark has inbuilt elastic search but for this purpose Elastic search is not a good option since this is totally text search problem? Elastic search is good for filtering and grouping. Does any body used solr inside spark? Regards jeetendra -- Costin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org <mailto:user-h...@spark.apache.org>
-- Costin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org