I assume because map() could have side effects? Even if that's not generally a good idea. The expectation or contract is that it is still invoked. In this program the caller could also call count() on the parent. On Mar 28, 2015 1:00 AM, "jimfcarroll" <jimfcarr...@gmail.com> wrote:
> Hi all, > > I was wondering why the RDD.count call recomputes the RDD in all cases? In > most cases it can simply ask the next dependent RDD. I have several RDD > implementations and was surprised to see a call like the following never > call my RDD's count method but instead recompute/traverse the entire > dataset: > > val myRDD: MyRDD = ... > myRDD.map({ ... }).count() > > Unless I'm mistaken, a MappedRDD never needs to do more than call 'count' > on > the underlying RDD. The underlying RDD's count method (in all of my cases) > know their count without a recompute (e.g. one of them selects the count > from a DB). This is MUCH less expensive than recomputing the RDD. > > Thanks. > Jim > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-count-tp11298.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >