Re: Spark streaming - tasks and stages continue to be generated when using reduce by key

Tathagata Das Thu, 10 Jul 2014 16:26:31 -0700

How are you supplying the text file?


On Wed, Jul 9, 2014 at 11:51 AM, M Singh <mans6si...@yahoo.com> wrote:

> Hi Folks:
>
> I am working on an application which uses spark streaming (version 1.1.0
> snapshot on a standalone cluster) to process text file and save counters in
> cassandra based on fields in each row.  I am testing the application in two
> modes:
>
>    - Process each row and save the counter in cassandra.  In this
>    scenario after the text file has been consumed, there is no task/stages
>    seen in the spark UI.
>    - If instead I use reduce by key before saving to cassandra, the spark
>    UI shows continuous generation of tasks/stages even after processing the
>    file has been completed.
>
> I believe this is because the reduce by key requires merging of data from
> different partitions.  But I was wondering if anyone has any
> insights/pointers for understanding this difference in behavior and how to
> avoid generating tasks/stages when there is no data (new file) available.
>
> Thanks
>
> Mans
>

Re: Spark streaming - tasks and stages continue to be generated when using reduce by key

Reply via email to