Both have same efficiency. The primary difference is that one is a
transformation (hence is lazy, and requires another action to actually
execute), and the other is an action.
But it may be a slightly better design in general to have "transformations"
be purely functional (that is, no external side effect) and all
non-functional stuff be "actions" (e.g., saveAsHadoopFile is an action).


On Mon, Jul 6, 2015 at 12:09 PM, Shushant Arora <[email protected]>
wrote:

> whats the difference between foreachPartition vs mapPartitions for a
> Dtstream both works at partition granularity?
>
> One is an operation and another is action but if I call an opeartion
> afterwords mapPartitions  also, which one is more efficient and
> recommeded?
>
> On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <[email protected]>
> wrote:
>
>> Yeah, creating a new producer at the granularity of partitions may not be
>> that costly.
>>
>> On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <[email protected]>
>> wrote:
>>
>>> Use foreachPartition, and allocate whatever the costly resource is once
>>> per partition.
>>>
>>> On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora <
>>> [email protected]> wrote:
>>>
>>>> I have a requirement to write in kafka queue from a spark streaming
>>>> application.
>>>>
>>>> I am using spark 1.2 streaming. Since different executors in spark are
>>>> allocated at each run so instantiating a new kafka producer at each run
>>>> seems a costly operation .Is there a way to reuse objects in processing
>>>> executors(not in receivers)?
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to