How are you supplying the text file?
On Wed, Jul 9, 2014 at 11:51 AM, M Singh <mans6si...@yahoo.com> wrote: > Hi Folks: > > I am working on an application which uses spark streaming (version 1.1.0 > snapshot on a standalone cluster) to process text file and save counters in > cassandra based on fields in each row. I am testing the application in two > modes: > > - Process each row and save the counter in cassandra. In this > scenario after the text file has been consumed, there is no task/stages > seen in the spark UI. > - If instead I use reduce by key before saving to cassandra, the spark > UI shows continuous generation of tasks/stages even after processing the > file has been completed. > > I believe this is because the reduce by key requires merging of data from > different partitions. But I was wondering if anyone has any > insights/pointers for understanding this difference in behavior and how to > avoid generating tasks/stages when there is no data (new file) available. > > Thanks > > Mans >