First off, I'd recommend using the latest es-hadoop beta (2.1.0.Beta3) or even
better, the dev build [1].
Second, using the native Java/Scala API [2] since the configuration and
performance are both easier.
Third, when you are using JSON input, tell es-hadoop/spark that. the connector
can work with both objects (the default) or
raw json.
It so just happens, the es-hadoop connector describes the above here [3] :).
Hope this helps,
[1]
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/install.html#download-dev
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-native
[3]
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-write-json
On 2/10/15 6:58 PM, shahid ashraf wrote:
thanks costin
i m grouping data together based on id in json and rdd contains
rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of
key/valu}],}),(3,{'SOURCES': [{n no. of
key/valu}],}),(4,{'SOURCES': [{n no. of key/valu}],})
rdd.saveAsNewAPIHadoopFile(
path='-',
outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={
"es.nodes" : "localhost",
"es.port" : "9200",
"es.resource" : "shahid/hcp_id"
})
spark-1.1.0-bin-hadoop1
java version "1.7.0_71"
elasticsearch-1.4.2
elasticsearch-hadoop-2.1.0.Beta2.jar
On Tue, Feb 10, 2015 at 10:05 PM, Costin Leau <[email protected]
<mailto:[email protected]>> wrote:
Sorry but there's too little information in this email to make any type of
assesment.
Can you please describe what you are trying to do, what version of Elastic
and es-spark are you suing
and potentially post a snippet of code?
What does your RDD contain?
On 2/10/15 6:05 PM, shahid wrote:
INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0 (TID 9,
ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes)
15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in stage
2.0
(TID 6) on executor ip-10-80-15-145.ec2.internal:
org.apache.spark.__SparkException (Data of type java.util.ArrayList
cannot be
used) [duplicate 1]
15/02/10 15:54:08 INFO scheduler.TaskSetManager: Starting task 1.1 in
stage
2.0 (TID 10, ip-10-80-15-145.ec2.internal, PROCESS_LOCAL, 1025 bytes)
--
View this message in context:
http://apache-spark-user-list.__1001560.n3.nabble.com/__Exception-when-trying-to-use-__EShadoop-connector-and-__writing-rdd-to-ES-tp21579.html
<http://apache-spark-user-list.1001560.n3.nabble.com/Exception-when-trying-to-use-EShadoop-connector-and-writing-rdd-to-ES-tp21579.html>
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Costin
--
with Regards
Shahid Ashraf
--
Costin
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]