Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
On Friday, June 27, 2014, boci wrote: > Thanks, more local thread solve the problem, it's work like a charm. How > many thread required? > Just more than one so that it can schedule the other task :) > > Adrian: it's not public project but ask, and I will answer (if I can)... > maybe later I wil

Re: ElasticSearch enrich

2014-06-27 Thread boci
Thanks, more local thread solve the problem, it's work like a charm. How many thread required? Adrian: it's not public project but ask, and I will answer (if I can)... maybe later I will create a demo project based on my solution. b0c1 -

Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
Try setting the master to local[4] On Fri, Jun 27, 2014 at 2:17 PM, boci wrote: > This is a simply scalatest. I start a SparkConf, set the master to local > (set the serializer etc), pull up kafka and es connection send a message to > kafka and wait 30sec to processing. > > It's run in IDEA no

Re: ElasticSearch enrich

2014-06-27 Thread boci
This is a simply scalatest. I start a SparkConf, set the master to local (set the serializer etc), pull up kafka and es connection send a message to kafka and wait 30sec to processing. It's run in IDEA no magick trick. b0c1

RE: ElasticSearch enrich

2014-06-27 Thread Adrian Mocanu
ubject: Re: ElasticSearch enrich Wow, thanks your fast answer, it's help a lot... b0c1 -- Skype: boci13, Hangout: boci.b...@gmail.com<mailto:boci.b...@gmail.com> On

Re: ElasticSearch enrich

2014-06-27 Thread Holden Karau
So a few quick questions: 1) What cluster are you running this against? Is it just local? Have you tried local[4]? 2) When you say breakpoint, how are you setting this break point? There is a good chance your breakpoint mechanism doesn't work in a distributed environment, could you instead cause a

Re: ElasticSearch enrich

2014-06-27 Thread boci
Ok I found dynamic resources, but I have a frustrating problem. This is the flow: kafka -> enrich X -> enrich Y -> enrich Z -> foreachRDD -> save My problem is: if I do this it's not work, the enrich functions not called, but if I put a print it's does. for example if I do this: kafka -> enrich X

Re: ElasticSearch enrich

2014-06-27 Thread boci
Another question. In the foreachRDD I will initialize the JobConf, but in this place how can I get information from the items? I have an identifier in the data which identify the required ES index (so how can I set dynamic index in the foreachRDD) ? b0c1 --

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
Just your luck I happened to be working on that very talk today :) Let me know how your experiences with Elasticsearch & Spark go :) On Thu, Jun 26, 2014 at 3:17 PM, boci wrote: > Wow, thanks your fast answer, it's help a lot... > > b0c1 > > > ---

Re: ElasticSearch enrich

2014-06-26 Thread boci
Wow, thanks your fast answer, it's help a lot... b0c1 -- Skype: boci13, Hangout: boci.b...@gmail.com On Thu, Jun 26, 2014 at 11:48 PM, Holden Karau wrote: > Hi b0c1,

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
Hi b0c1, I have an example of how to do this in the repo for my talk as well, the specific example is at https://github.com/holdenk/elasticsearchspark/blob/master/src/main/scala/com/holdenkarau/esspark/IndexTweetsLive.scala . Since DStream doesn't have a saveAsHadoopDataset we use foreachRDD and t

Re: ElasticSearch enrich

2014-06-26 Thread boci
Thanks. I without local option I can connect with es remote, now I only have one problem. How can I use elasticsearch-hadoop with spark streaming? I mean DStream doesn't have "saveAsHadoopFiles" method, my second problem the output index is depend by the input data. Thanks ---

Re: ElasticSearch enrich

2014-06-26 Thread Nick Pentreath
You can just add elasticsearch-hadoop as a dependency to your project to user the ESInputFormat and ESOutputFormat ( https://github.com/elasticsearch/elasticsearch-hadoop). Some other basics here: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html For testing, yes I thin

Re: ElasticSearch enrich

2014-06-26 Thread boci
That's okay, but hadoop has ES integration. what happened if I run saveAsHadoopFile without hadoop (or I must need to pull up hadoop programatically? (if I can)) b0c1 --

Re: ElasticSearch enrich

2014-06-25 Thread Holden Karau
On Wed, Jun 25, 2014 at 4:16 PM, boci wrote: > Hi guys, thanks the direction now I have some problem/question: > - in local (test) mode I want to use ElasticClient.local to create es > connection, but in prodution I want to use ElasticClient.remote, to this I > want to pass ElasticClient to mapPa

Re: ElasticSearch enrich

2014-06-25 Thread boci
Hi guys, thanks the direction now I have some problem/question: - in local (test) mode I want to use ElasticClient.local to create es connection, but in prodution I want to use ElasticClient.remote, to this I want to pass ElasticClient to mapPartitions, or what is the best practices? - my stream ou

Re: ElasticSearch enrich

2014-06-24 Thread Holden Karau
So I'm giving a talk at the Spark summit on using Spark & ElasticSearch, but for now if you want to see a simple demo which uses elasticsearch for geo input you can take a look at my quick & dirty implementation with TopTweetsInALocation ( https://github.com/holdenk/elasticsearchspark/blob/master/s

Re: ElasticSearch enrich

2014-06-24 Thread Mayur Rustagi
Its not used as default serializer for some issues with compatibility & requirement to register the classes.. Which part are you getting as nonserializable... you need to serialize that class if you are sending it to spark workers inside a map, reduce , mappartition or any of the operations on RDD

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
I'm afraid persisting connection across two tasks is a dangerous act as they can't be guaranteed to be executed on the same machine. Your ES server may think its a man-in-the-middle attack! I think its possible to invoke a static method that give you a connection in a local 'pool', so nothing will

Re: ElasticSearch enrich

2014-06-24 Thread boci
I using elastic4s inside my ESWorker class. ESWorker now only contain two field, host:String, port:Int. Now Inside the "findNearestCity" method I create ElasticClient (elastic4s) connection. What's wrong with my class? I need to serialize ElasticClient? mappartition is sounds good but I still got N

Re: ElasticSearch enrich

2014-06-24 Thread Mayur Rustagi
Mostly ES client is not serializable for you. You can do 3 workarounds, 1. Switch to kryo serialization, register the client in kryo , might solve your serialization issue 2. Use mappartition for all your data & initialize your client in the mappartition code, this will create client for each parti

Re: ElasticSearch enrich

2014-06-24 Thread boci
Ok but in this case where can I store the ES connection? Or all document create new ES connection inside the worker? -- Skype: boci13, Hangout: boci.b...@gmail.com On W

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
make sure all queries are called through class methods and wrap your query info with a class having only simple properties (strings, collections etc). If you can't find such wrapper you can also use SerializableWritable wrapper out-of-the-box, but its not recommended. (developer-api and make fat cl