yes, Spark needs to create the RDD first(loads all the data) to create the
sample. You can split the files into two sets outside of spark in order to
load only the sample set. Thank youDhiraj 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Behaviour-of-RDD-sampling-tp27052p27057.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to