Split RDD based on criteria

dgoldenberg Wed, 10 Jun 2015 05:56:50 -0700

Hi,

I'm gathering that the typical approach for splitting an RDD is to apply
several filters to it.


rdd1 = rdd.filter(func1);
rdd2 = rdd.filter(func2);
...

Is there/should there be a way to create 'buckets' like these in one go?

List<RDD> rddList = rdd.filter(func1, func2, ..., funcN)

Another angle here is, when applying a filter(func), is there a way to get
two RDD's back, one for which func returned true for all elements of the
original RDD (the one being filtered), and the other one for which func
returned false for all the elements?

Pair<RDD> pair = rdd.filterTrueFalse(func);

Right now I'm doing

RDD x = rdd.filter(func);
RDD y = rdd.filter(reverseOfFunc);

This seems a bit tautological to me, though Spark must be optimizing this
out (?)

Thanks.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Split-RDD-based-on-criteria-tp23254.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Split RDD based on criteria

Reply via email to