Thanks, Cody. The "good boy" comment wasn't from me :)  I was the one
asking for help.



On Wed, Jul 8, 2015 at 10:52 AM, Cody Koeninger <c...@koeninger.org> wrote:

> Sean already answered your question.  foreachRDD and foreachPartition are
> completely different, there's nothing fuzzy or insufficient about that
> answer.  The fact that you can call foreachPartition on an rdd within the
> scope of foreachRDD should tell you that they aren't in any way comparable.
>
> I'm not sure if your rudeness ("be a good boy"...really?) is intentional
> or not.  If you're asking for help from people that are in most cases
> donating their time, I'd suggest that you'll have more success with a
> little more politeness.
>
> On Wed, Jul 8, 2015 at 9:05 AM, Evo Eftimov <evo.efti...@isecc.com> wrote:
>
>> That was a) fuzzy b) insufficient – one can certainly use forach (only)
>> on DStream RDDs – it works as empirical observation
>>
>>
>>
>> As another empirical observation:
>>
>>
>>
>> For each partition results in having one instance of the lambda/closure
>> per partition when e.g. publishing to output systems like message brokers,
>> databases and file systems - that increases the level of parallelism of
>> your output processing
>>
>>
>>
>> As an architect I deal with gazillions of products and don’t have time to
>> read the source code of all of them to make up for documentation
>> deficiencies. On the other hand I believe you have been involved in writing
>> some of the code so be a good boy and either answer this question properly
>> or enhance the product documentation of that area of the system
>>
>>
>>
>> *From:* Sean Owen [mailto:so...@cloudera.com]
>> *Sent:* Wednesday, July 8, 2015 2:52 PM
>> *To:* dgoldenberg; user@spark.apache.org
>> *Subject:* Re: foreachRDD vs. forearchPartition ?
>>
>>
>>
>> These are quite different operations. One operates on RDDs in  DStream
>> and one operates on partitions of an RDD. They are not alternatives.
>>
>>
>>
>> On Wed, Jul 8, 2015, 2:43 PM dgoldenberg <dgoldenberg...@gmail.com>
>> wrote:
>>
>> Is there a set of best practices for when to use foreachPartition vs.
>> foreachRDD?
>>
>> Is it generally true that using foreachPartition avoids some of the
>> over-network data shuffling overhead?
>>
>> When would I definitely want to use one method vs. the other?
>>
>> Thanks.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/foreachRDD-vs-forearchPartition-tp23714.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to