Not sure what did you refer to when saying replicated rdd, if you actually mean RDD, then, yes , read the API doc and paper as Tobias mentioned. If you actually focus on the word "replicated", then that is for fault tolerant, and probably mostly used in the streaming case for receiver created RDD.
For Spark, Application is your user program. And a job is an internal schedule conception, It's a group of some RDD operation. Your applications might invoke several jobs. Best Regards, Raymond Liu From: rapelly kartheek [mailto:kartheek.m...@gmail.com] Sent: Wednesday, September 03, 2014 5:03 PM To: user@spark.apache.org Subject: RDDs Hi, Can someone tell me what kind of operations can be performed on a replicated rdd?? What are the use-cases of a replicated rdd. One basic doubt that is bothering me from long time: what is the difference between an application and job in the Spark parlance. I am confused b'cas of Hadoop jargon. Thank you