On Friday, June 27, 2014, boci wrote:
> Thanks, more local thread solve the problem, it's work like a charm. How
> many thread required?
>
Just more than one so that it can schedule the other task :)
>
> Adrian: it's not public project but ask, and I will answer (if I can)...
> maybe later I wil
Thanks, more local thread solve the problem, it's work like a charm. How
many thread required?
Adrian: it's not public project but ask, and I will answer (if I can)...
maybe later I will create a demo project based on my solution.
b0c1
-
Try setting the master to local[4]
On Fri, Jun 27, 2014 at 2:17 PM, boci wrote:
> This is a simply scalatest. I start a SparkConf, set the master to local
> (set the serializer etc), pull up kafka and es connection send a message to
> kafka and wait 30sec to processing.
>
> It's run in IDEA no
This is a simply scalatest. I start a SparkConf, set the master to local
(set the serializer etc), pull up kafka and es connection send a message to
kafka and wait 30sec to processing.
It's run in IDEA no magick trick.
b0c1
ubject: Re: ElasticSearch enrich
Wow, thanks your fast answer, it's help a lot...
b0c1
--
Skype: boci13, Hangout: boci.b...@gmail.com<mailto:boci.b...@gmail.com>
On
So a few quick questions:
1) What cluster are you running this against? Is it just local? Have you
tried local[4]?
2) When you say breakpoint, how are you setting this break point? There is
a good chance your breakpoint mechanism doesn't work in a distributed
environment, could you instead cause a
Ok I found dynamic resources, but I have a frustrating problem. This is the
flow:
kafka -> enrich X -> enrich Y -> enrich Z -> foreachRDD -> save
My problem is: if I do this it's not work, the enrich functions not called,
but if I put a print it's does. for example if I do this:
kafka -> enrich X
Another question. In the foreachRDD I will initialize the JobConf, but in
this place how can I get information from the items?
I have an identifier in the data which identify the required ES index (so
how can I set dynamic index in the foreachRDD) ?
b0c1
--
Just your luck I happened to be working on that very talk today :) Let me
know how your experiences with Elasticsearch & Spark go :)
On Thu, Jun 26, 2014 at 3:17 PM, boci wrote:
> Wow, thanks your fast answer, it's help a lot...
>
> b0c1
>
>
> ---
Wow, thanks your fast answer, it's help a lot...
b0c1
--
Skype: boci13, Hangout: boci.b...@gmail.com
On Thu, Jun 26, 2014 at 11:48 PM, Holden Karau wrote:
> Hi b0c1,
Hi b0c1,
I have an example of how to do this in the repo for my talk as well, the
specific example is at
https://github.com/holdenk/elasticsearchspark/blob/master/src/main/scala/com/holdenkarau/esspark/IndexTweetsLive.scala
. Since DStream doesn't have a saveAsHadoopDataset we use foreachRDD and
t
Thanks. I without local option I can connect with es remote, now I only
have one problem. How can I use elasticsearch-hadoop with spark streaming?
I mean DStream doesn't have "saveAsHadoopFiles" method, my second problem
the output index is depend by the input data.
Thanks
---
You can just add elasticsearch-hadoop as a dependency to your project to
user the ESInputFormat and ESOutputFormat (
https://github.com/elasticsearch/elasticsearch-hadoop). Some other basics
here:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html
For testing, yes I thin
That's okay, but hadoop has ES integration. what happened if I run
saveAsHadoopFile without hadoop (or I must need to pull up hadoop
programatically? (if I can))
b0c1
--
On Wed, Jun 25, 2014 at 4:16 PM, boci wrote:
> Hi guys, thanks the direction now I have some problem/question:
> - in local (test) mode I want to use ElasticClient.local to create es
> connection, but in prodution I want to use ElasticClient.remote, to this I
> want to pass ElasticClient to mapPa
Hi guys, thanks the direction now I have some problem/question:
- in local (test) mode I want to use ElasticClient.local to create es
connection, but in prodution I want to use ElasticClient.remote, to this I
want to pass ElasticClient to mapPartitions, or what is the best practices?
- my stream ou
So I'm giving a talk at the Spark summit on using Spark & ElasticSearch,
but for now if you want to see a simple demo which uses elasticsearch for
geo input you can take a look at my quick & dirty implementation with
TopTweetsInALocation (
https://github.com/holdenk/elasticsearchspark/blob/master/s
Its not used as default serializer for some issues with compatibility &
requirement to register the classes..
Which part are you getting as nonserializable... you need to serialize that
class if you are sending it to spark workers inside a map, reduce ,
mappartition or any of the operations on RDD
I'm afraid persisting connection across two tasks is a dangerous act as they
can't be guaranteed to be executed on the same machine. Your ES server may
think its a man-in-the-middle attack!
I think its possible to invoke a static method that give you a connection in
a local 'pool', so nothing will
I using elastic4s inside my ESWorker class. ESWorker now only contain two
field, host:String, port:Int. Now Inside the "findNearestCity" method I
create ElasticClient (elastic4s) connection. What's wrong with my class? I
need to serialize ElasticClient? mappartition is sounds good but I still
got N
Mostly ES client is not serializable for you. You can do 3 workarounds,
1. Switch to kryo serialization, register the client in kryo , might solve
your serialization issue
2. Use mappartition for all your data & initialize your client in the
mappartition code, this will create client for each parti
Ok but in this case where can I store the ES connection? Or all document
create new ES connection inside the worker?
--
Skype: boci13, Hangout: boci.b...@gmail.com
On W
make sure all queries are called through class methods and wrap your query
info with a class having only simple properties (strings, collections etc).
If you can't find such wrapper you can also use SerializableWritable wrapper
out-of-the-box, but its not recommended. (developer-api and make fat
cl
23 matches
Mail list logo