Re: Behaviour of RDD sampling

firemonk9 Tue, 31 May 2016 07:59:45 -0700

yes, Spark needs to create the RDD first(loads all the data) to create the
sample. You can split the files into two sets outside of spark in order to
load only the sample set. Thank youDhiraj




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Behaviour-of-RDD-sampling-tp27052p27057.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Behaviour of RDD sampling

Reply via email to