As far as I understand, operations on rdd's usually come in the form rdd => map1 => map2 => map2 => (maybe collect)
If I would like to also count my RDD, is there any way I could include this at map1? So that as spark runs through map1, it also does a count? Or would count need to be a separate operation such that I would have to run through my dataset again. My dataset is really memory intensive so I'd rather not cache() it if possible. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-run-two-operations-on-the-same-RDD-simultaneously-tp25441.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org