RE: RDDs

Liu, Raymond Wed, 03 Sep 2014 23:11:27 -0700

Actually, a replicated RDD and a parallel job on the same RDD, this two 
conception is not related at all. 
A replicated RDD just store data on multiple node, it helps with HA and provide 
better chance for data locality. It is still one RDD, not two separate RDD.
While regarding run two jobs on the same RDD, it doesn't matter that the RDD is 
replicated or not. You can always do it if you wish to.



Best Regards,
Raymond Liu

-----Original Message-----
From: Kartheek.R [mailto:kartheek.m...@gmail.com] 
Sent: Thursday, September 04, 2014 1:24 PM
To: u...@spark.incubator.apache.org
Subject: RE: RDDs

Thank you Raymond and Tobias. 
Yeah, I am very clear about what I was asking. I was talking about "replicated" 
rdd only. Now that I've got my understanding about job and application 
validated, I wanted to know if we can replicate an rdd and run two jobs (that 
need same rdd) of an application in parallel?.

-Karthk




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13416.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: RDDs

Reply via email to