cartesian is an expensive operation. If you have 'M' records in location, then 
locations. cartesian(locations) will generate MxM result.If locations is a big 
RDD, it is hard to do the locations. cartesian(locations) efficiently.Yong
> Date: Tue, 7 Apr 2015 10:04:12 -0700
> From: mas.ha...@gmail.com
> To: user@spark.apache.org
> Subject: Incremently load big RDD file into Memory
> 
> 
> val locations = filelines.map(line => line.split("\t")).map(t =>
> (t(5).toLong, (t(2).toDouble, t(3).toDouble))).distinct().collect()
> 
> val cartesienProduct=locations.cartesian(locations).map(t=>
> Edge(t._1._1,t._2._1,distanceAmongPoints(t._1._2._1,t._1._2._2,t._2._2._1,t._2._2._2)))
> 
> Code executes perfectly fine uptill here but when i try to use
> "cartesienProduct" it got stuck i.e.
> 
> val count =cartesienProduct.count()
> 
> Any help to efficiently do this will be highly appreciated.
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Incremently-load-big-RDD-file-into-Memory-tp22410.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
                                          

Reply via email to