Sean already answered your question.  foreachRDD and foreachPartition are
completely different, there's nothing fuzzy or insufficient about that
answer.  The fact that you can call foreachPartition on an rdd within the
scope of foreachRDD should tell you that they aren't in any way comparable.

I'm not sure if your rudeness ("be a good boy"...really?) is intentional or
not.  If you're asking for help from people that are in most cases donating
their time, I'd suggest that you'll have more success with a little more
politeness.

On Wed, Jul 8, 2015 at 9:05 AM, Evo Eftimov <evo.efti...@isecc.com> wrote:

> That was a) fuzzy b) insufficient – one can certainly use forach (only) on
> DStream RDDs – it works as empirical observation
>
>
>
> As another empirical observation:
>
>
>
> For each partition results in having one instance of the lambda/closure
> per partition when e.g. publishing to output systems like message brokers,
> databases and file systems - that increases the level of parallelism of
> your output processing
>
>
>
> As an architect I deal with gazillions of products and don’t have time to
> read the source code of all of them to make up for documentation
> deficiencies. On the other hand I believe you have been involved in writing
> some of the code so be a good boy and either answer this question properly
> or enhance the product documentation of that area of the system
>
>
>
> *From:* Sean Owen [mailto:so...@cloudera.com]
> *Sent:* Wednesday, July 8, 2015 2:52 PM
> *To:* dgoldenberg; user@spark.apache.org
> *Subject:* Re: foreachRDD vs. forearchPartition ?
>
>
>
> These are quite different operations. One operates on RDDs in  DStream and
> one operates on partitions of an RDD. They are not alternatives.
>
>
>
> On Wed, Jul 8, 2015, 2:43 PM dgoldenberg <dgoldenberg...@gmail.com> wrote:
>
> Is there a set of best practices for when to use foreachPartition vs.
> foreachRDD?
>
> Is it generally true that using foreachPartition avoids some of the
> over-network data shuffling overhead?
>
> When would I definitely want to use one method vs. the other?
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/foreachRDD-vs-forearchPartition-tp23714.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to