Re: how to maintain the offset for spark streaming if HDFS is the source

Akhil Das Tue, 16 Jun 2015 06:13:48 -0700

With sparkstreaming when you use fileStream or textFileStream it will
always pick up the files from the directory whose timestamp is > the
current timestamp, and if you have checkpointing enabled then it would
start from the last read timestamp. So you may not need to maintain the
line number.


Thanks
Best Regards

On Tue, Jun 16, 2015 at 5:55 PM, Manohar753 <[email protected]
> wrote:

> Hi All,
> In my usecase  HDFS  file as  source for Spark Stream,
> the job will process the data line by line but how will make sure to
> maintain the offset line number(data already processed) while
> restarting/new
> code push .
>
> Team can you please reply on this is there any configuration in Spark.
>
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-maintain-the-offset-for-spark-streaming-if-HDFS-is-the-source-tp23336.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: how to maintain the offset for spark streaming if HDFS is the source

Reply via email to