Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
nt to mapPartitions, or >>>>>>>>>>>>>> what is the best practices? >>>>>>>>>>>>>> >>>>>>>>>>>>> In this case you probably want to make the ElasticClient >>>>>>>>

Re: ElasticSearch enrich

2014-06-27 Thread boci
can I >>>>>>>>>>>>> test output.saveAsHadoopFile[ESOutputFormat]("-") in local >>>>>>>>>>>>> environment? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>&g

Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
gt;>>>>>>>>>>> >>>>>>>>>>> I think the simplest thing to do would be use the same client in >>>>>>>>>>> mode and just start single node elastic search cluster. >>>>>>>>>>> >

Re: ElasticSearch enrich

2014-06-27 Thread boci
t;>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>

RE: ElasticSearch enrich

2014-06-27 Thread Adrian Mocanu
ubject: Re: ElasticSearch enrich Wow, thanks your fast answer, it's help a lot... b0c1 -- Skype: boci13, Hangout: boci.b...@gmail.com<mailto:boci.b...@gmail.com> On

Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
t;>>>>>>>> >>>>>>>>>> On Wed, Jun 25, 2014 at 1:33 AM, Holden Karau < >>>>>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>>>>> >>>>>>>>>>> So I'm

Re: ElasticSearch enrich

2014-06-27 Thread boci
t;>>>>>>>>> elasticsearch for geo input you can take a look at my quick & dirty >>>>>>>>>> implementation with TopTweetsInALocation ( >>>>>>>>>> https://github.com/holdenk/elasticsearchspark/blob/master/src/m

Re: ElasticSearch enrich

2014-06-27 Thread boci
uses the ESInputFormat which avoids the difficulty of >>>>>>>>> having to manually create ElasticSearch clients. >>>>>>>>> >>>>>>>>> This approach might not work for your data, e.g. if you need to >>>>>>&

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
gt;>>> instead look at using mapPartitions and setting up your Elasticsearch >>>>>>>> connection inside of that, so you could then re-use the client for all >>>>>>>> of >>>>>>>> the queries on each par

Re: ElasticSearch enrich

2014-06-26 Thread boci
ction. >>>>>>> >>>>>>> Hope this helps! >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Holden :) >>>>>>> >>>>

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
un 24, 2014 at 4:28 PM, Mayur Rustagi < >>>>>> mayur.rust...@gmail.com> wrote: >>>>>> >>>>>>> Its not used as default serializer for some issues with >>>>>>> compatibility & requi

Re: ElasticSearch enrich

2014-06-26 Thread boci
bility >>>>>> & requirement to register the classes.. >>>>>> >>>>>> Which part are you getting as nonserializable... you need to >>>>>> serialize that class if you are sending it to spark workers inside a map, >>>>

Re: ElasticSearch enrich

2014-06-26 Thread Nick Pentreath
>>>>> >>>>> Mayur Rustagi >>>>> Ph: +1 (760) 203 3257 >>>>> http://www.sigmoidanalytics.com >>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi> >>>>> >>>

Re: ElasticSearch enrich

2014-06-26 Thread boci
>> >>>> >>>> On Wed, Jun 25, 2014 at 4:52 AM, Peng Cheng wrote: >>>> >>>>> I'm afraid persisting connection across two tasks is a dangerous act >>>>> as they >>>>> can't be guaranteed to be executed on

Re: ElasticSearch enrich

2014-06-25 Thread Holden Karau
; I'm afraid persisting connection across two tasks is a dangerous act as >>>> they >>>> can't be guaranteed to be executed on the same machine. Your ES server >>>> may >>>> think its a man-in-the-middle attack! >>>> >>>> I thin

Re: ElasticSearch enrich

2014-06-25 Thread boci
27;, so nothing will sneak into your closure, but its too >>> complex >>> and there should be a better option. >>> >>> Never use kryo before, if its that good perhaps we should use it as the >>> default serializer >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8222.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >> >> > > > -- > Cell : 425-233-8271 >

Re: ElasticSearch enrich

2014-06-24 Thread Holden Karau
; Never use kryo before, if its that good perhaps we should use it as the >> default serializer >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8222.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > > -- Cell : 425-233-8271

Re: ElasticSearch enrich

2014-06-24 Thread Mayur Rustagi
> and there should be a better option. > > Never use kryo before, if its that good perhaps we should use it as the > default serializer > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8222.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
ol', so nothing will sneak into your closure, but its too complex and there should be a better option. Never use kryo before, if its that good perhaps we should use it as the default serializer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearc

Re: ElasticSearch enrich

2014-06-24 Thread boci
I using elastic4s inside my ESWorker class. ESWorker now only contain two field, host:String, port:Int. Now Inside the "findNearestCity" method I create ElasticClient (elastic4s) connection. What's wrong with my class? I need to serialize ElasticClient? mappartition is sounds good but I still got N

Re: ElasticSearch enrich

2014-06-24 Thread Mayur Rustagi
Mostly ES client is not serializable for you. You can do 3 workarounds, 1. Switch to kryo serialization, register the client in kryo , might solve your serialization issue 2. Use mappartition for all your data & initialize your client in the mappartition code, this will create client for each parti

Re: ElasticSearch enrich

2014-06-24 Thread boci
pper > out-of-the-box, but its not recommended. (developer-api and make fat > closures that run slowly) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8214.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
e fat closures that run slowly) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8214.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

ElasticSearch enrich

2014-06-24 Thread boci
Hi guys, I have a small question. I want to create a "Worker" class which using ElasticClient to make query to elasticsearch. (I want to enrich my data with geo search result). How can I do that? I try to create a worker instance with ES host/port parameter but spark throw an exceptino (my class