I assume because map() could have side effects? Even if that's not
generally a good idea. The expectation or contract is that it is still
invoked. In this program the caller could also call count() on the parent.
On Mar 28, 2015 1:00 AM, "jimfcarroll" <jimfcarr...@gmail.com> wrote:

> Hi all,
>
> I was wondering why the RDD.count call recomputes the RDD in all cases? In
> most cases it can simply ask the next dependent RDD. I have several RDD
> implementations and was surprised to see a call like the following never
> call my RDD's count method but instead recompute/traverse the entire
> dataset:
>
>    val myRDD: MyRDD = ...
>    myRDD.map({ ... }).count()
>
> Unless I'm mistaken, a MappedRDD never needs to do more than call 'count'
> on
> the underlying RDD. The underlying RDD's count method (in all of my cases)
> know their count without a recompute (e.g. one of them selects the count
> from a DB). This is MUCH less expensive than recomputing the RDD.
>
> Thanks.
> Jim
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-count-tp11298.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to