Hi, I'm gathering that the typical approach for splitting an RDD is to apply several filters to it.
rdd1 = rdd.filter(func1); rdd2 = rdd.filter(func2); ... Is there/should there be a way to create 'buckets' like these in one go? List<RDD> rddList = rdd.filter(func1, func2, ..., funcN) Another angle here is, when applying a filter(func), is there a way to get two RDD's back, one for which func returned true for all elements of the original RDD (the one being filtered), and the other one for which func returned false for all the elements? Pair<RDD> pair = rdd.filterTrueFalse(func); Right now I'm doing RDD x = rdd.filter(func); RDD y = rdd.filter(reverseOfFunc); This seems a bit tautological to me, though Spark must be optimizing this out (?) Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Split-RDD-based-on-criteria-tp23254.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org