nt to mapPartitions, or
>>>>>>>>>>>>>> what is the best practices?
>>>>>>>>>>>>>>
>>>>>>>>>>>>> In this case you probably want to make the ElasticClient
>>>>>>>>
can I
>>>>>>>>>>>>> test output.saveAsHadoopFile[ESOutputFormat]("-") in local
>>>>>>>>>>>>> environment?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>&g
gt;>>>>>>>>>>>
>>>>>>>>>>> I think the simplest thing to do would be use the same client in
>>>>>>>>>>> mode and just start single node elastic search cluster.
>>>>>>>>>>>
>
t;>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
ubject: Re: ElasticSearch enrich
Wow, thanks your fast answer, it's help a lot...
b0c1
--
Skype: boci13, Hangout: boci.b...@gmail.com<mailto:boci.b...@gmail.com>
On
t;>>>>>>>>
>>>>>>>>>> On Wed, Jun 25, 2014 at 1:33 AM, Holden Karau <
>>>>>>>>>> hol...@pigscanfly.ca> wrote:
>>>>>>>>>>
>>>>>>>>>>> So I'm
t;>>>>>>>>> elasticsearch for geo input you can take a look at my quick & dirty
>>>>>>>>>> implementation with TopTweetsInALocation (
>>>>>>>>>> https://github.com/holdenk/elasticsearchspark/blob/master/src/m
uses the ESInputFormat which avoids the difficulty of
>>>>>>>>> having to manually create ElasticSearch clients.
>>>>>>>>>
>>>>>>>>> This approach might not work for your data, e.g. if you need to
>>>>>>&
gt;>>> instead look at using mapPartitions and setting up your Elasticsearch
>>>>>>>> connection inside of that, so you could then re-use the client for all
>>>>>>>> of
>>>>>>>> the queries on each par
ction.
>>>>>>>
>>>>>>> Hope this helps!
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Holden :)
>>>>>>>
>>>>
un 24, 2014 at 4:28 PM, Mayur Rustagi <
>>>>>> mayur.rust...@gmail.com> wrote:
>>>>>>
>>>>>>> Its not used as default serializer for some issues with
>>>>>>> compatibility & requi
bility
>>>>>> & requirement to register the classes..
>>>>>>
>>>>>> Which part are you getting as nonserializable... you need to
>>>>>> serialize that class if you are sending it to spark workers inside a map,
>>>>
>>>>>
>>>>> Mayur Rustagi
>>>>> Ph: +1 (760) 203 3257
>>>>> http://www.sigmoidanalytics.com
>>>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>>>
>>>
>>
>>>>
>>>> On Wed, Jun 25, 2014 at 4:52 AM, Peng Cheng wrote:
>>>>
>>>>> I'm afraid persisting connection across two tasks is a dangerous act
>>>>> as they
>>>>> can't be guaranteed to be executed on
; I'm afraid persisting connection across two tasks is a dangerous act as
>>>> they
>>>> can't be guaranteed to be executed on the same machine. Your ES server
>>>> may
>>>> think its a man-in-the-middle attack!
>>>>
>>>> I thin
27;, so nothing will sneak into your closure, but its too
>>> complex
>>> and there should be a better option.
>>>
>>> Never use kryo before, if its that good perhaps we should use it as the
>>> default serializer
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8222.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>
>>
>
>
> --
> Cell : 425-233-8271
>
; Never use kryo before, if its that good perhaps we should use it as the
>> default serializer
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8222.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
--
Cell : 425-233-8271
> and there should be a better option.
>
> Never use kryo before, if its that good perhaps we should use it as the
> default serializer
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8222.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
ol', so nothing will sneak into your closure, but its too complex
and there should be a better option.
Never use kryo before, if its that good perhaps we should use it as the
default serializer
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearc
I using elastic4s inside my ESWorker class. ESWorker now only contain two
field, host:String, port:Int. Now Inside the "findNearestCity" method I
create ElasticClient (elastic4s) connection. What's wrong with my class? I
need to serialize ElasticClient? mappartition is sounds good but I still
got N
Mostly ES client is not serializable for you. You can do 3 workarounds,
1. Switch to kryo serialization, register the client in kryo , might solve
your serialization issue
2. Use mappartition for all your data & initialize your client in the
mappartition code, this will create client for each parti
pper
> out-of-the-box, but its not recommended. (developer-api and make fat
> closures that run slowly)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8214.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
e fat
closures that run slowly)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8214.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi guys,
I have a small question. I want to create a "Worker" class which using
ElasticClient to make query to elasticsearch. (I want to enrich my data
with geo search result).
How can I do that? I try to create a worker instance with ES host/port
parameter but spark throw an exceptino (my class
24 matches
Mail list logo