How to run two operations on the same RDD simultaneously

jluan Fri, 20 Nov 2015 15:39:07 -0800

As far as I understand, operations on rdd's usually come in the form

rdd => map1 => map2 => map2 => (maybe collect)


If I would like to also count my RDD, is there any way I could include this
at map1? So that as spark runs through map1, it also does a count? Or would
count need to be a separate operation such that I would have to run through
my dataset again. My dataset is really memory intensive so I'd rather not
cache() it if possible.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-run-two-operations-on-the-same-RDD-simultaneously-tp25441.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to run two operations on the same RDD simultaneously

Reply via email to