Thanks, Cody. The "good boy" comment wasn't from me :) I was the one asking for help.
On Wed, Jul 8, 2015 at 10:52 AM, Cody Koeninger <c...@koeninger.org> wrote: > Sean already answered your question. foreachRDD and foreachPartition are > completely different, there's nothing fuzzy or insufficient about that > answer. The fact that you can call foreachPartition on an rdd within the > scope of foreachRDD should tell you that they aren't in any way comparable. > > I'm not sure if your rudeness ("be a good boy"...really?) is intentional > or not. If you're asking for help from people that are in most cases > donating their time, I'd suggest that you'll have more success with a > little more politeness. > > On Wed, Jul 8, 2015 at 9:05 AM, Evo Eftimov <evo.efti...@isecc.com> wrote: > >> That was a) fuzzy b) insufficient – one can certainly use forach (only) >> on DStream RDDs – it works as empirical observation >> >> >> >> As another empirical observation: >> >> >> >> For each partition results in having one instance of the lambda/closure >> per partition when e.g. publishing to output systems like message brokers, >> databases and file systems - that increases the level of parallelism of >> your output processing >> >> >> >> As an architect I deal with gazillions of products and don’t have time to >> read the source code of all of them to make up for documentation >> deficiencies. On the other hand I believe you have been involved in writing >> some of the code so be a good boy and either answer this question properly >> or enhance the product documentation of that area of the system >> >> >> >> *From:* Sean Owen [mailto:so...@cloudera.com] >> *Sent:* Wednesday, July 8, 2015 2:52 PM >> *To:* dgoldenberg; user@spark.apache.org >> *Subject:* Re: foreachRDD vs. forearchPartition ? >> >> >> >> These are quite different operations. One operates on RDDs in DStream >> and one operates on partitions of an RDD. They are not alternatives. >> >> >> >> On Wed, Jul 8, 2015, 2:43 PM dgoldenberg <dgoldenberg...@gmail.com> >> wrote: >> >> Is there a set of best practices for when to use foreachPartition vs. >> foreachRDD? >> >> Is it generally true that using foreachPartition avoids some of the >> over-network data shuffling overhead? >> >> When would I definitely want to use one method vs. the other? >> >> Thanks. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/foreachRDD-vs-forearchPartition-tp23714.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >