Both have same efficiency. The primary difference is that one is a transformation (hence is lazy, and requires another action to actually execute), and the other is an action. But it may be a slightly better design in general to have "transformations" be purely functional (that is, no external side effect) and all non-functional stuff be "actions" (e.g., saveAsHadoopFile is an action).
On Mon, Jul 6, 2015 at 12:09 PM, Shushant Arora <[email protected]> wrote: > whats the difference between foreachPartition vs mapPartitions for a > Dtstream both works at partition granularity? > > One is an operation and another is action but if I call an opeartion > afterwords mapPartitions also, which one is more efficient and > recommeded? > > On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <[email protected]> > wrote: > >> Yeah, creating a new producer at the granularity of partitions may not be >> that costly. >> >> On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <[email protected]> >> wrote: >> >>> Use foreachPartition, and allocate whatever the costly resource is once >>> per partition. >>> >>> On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora < >>> [email protected]> wrote: >>> >>>> I have a requirement to write in kafka queue from a spark streaming >>>> application. >>>> >>>> I am using spark 1.2 streaming. Since different executors in spark are >>>> allocated at each run so instantiating a new kafka producer at each run >>>> seems a costly operation .Is there a way to reuse objects in processing >>>> executors(not in receivers)? >>>> >>>> >>>> >>> >> >
