Hi Cheng,
Thank you for your response. While I tried your solution,
.mapValues { positions =>
for {
a <- positions.iterator
b <- positions.iterator
if lessThan(a, b) && distance(a, b) < 100
} yield {
(a, b)
}
}
I got the result
*res29: org.apache.spark.rdd.RDD[(String, Iterator[((Double, Double),
(Double, Double))])] = MappedValuesRDD[30] at mapValues at <console>:33*
But when I try to print the first element of the result say, *res29.first*
I get the following exception:
/java.io.NotSerializableException: scala.collection.Iterator$$anon$13
at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
at
java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1375)
at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1171)
at
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:71)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
14/06/05 07:09:53 WARN TaskSetManager: Lost TID 15 (task 26.0:0)
14/06/05 07:09:53 ERROR TaskSetManager: Task 26.0:0 had a not serializable
result: java.io.NotSerializableException:
scala.collection.Iterator$$anon$13; not retrying/
Can you please let me know how I can get over this problem?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7027.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.