You can try to use an Accumulator (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulator) to keep count in map1. Note that the final count may be higher than the number of records if there were some retries along the way. -- Ali
On Nov 20, 2015, at 3:38 PM, jluan <jaylu...@gmail.com> wrote: > As far as I understand, operations on rdd's usually come in the form > > rdd => map1 => map2 => map2 => (maybe collect) > > If I would like to also count my RDD, is there any way I could include this > at map1? So that as spark runs through map1, it also does a count? Or would > count need to be a separate operation such that I would have to run through > my dataset again. My dataset is really memory intensive so I'd rather not > cache() it if possible. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-run-two-operations-on-the-same-RDD-simultaneously-tp25441.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >