With sparkstreaming when you use fileStream or textFileStream it will always pick up the files from the directory whose timestamp is > the current timestamp, and if you have checkpointing enabled then it would start from the last read timestamp. So you may not need to maintain the line number.
Thanks Best Regards On Tue, Jun 16, 2015 at 5:55 PM, Manohar753 <[email protected] > wrote: > Hi All, > In my usecase HDFS file as source for Spark Stream, > the job will process the data line by line but how will make sure to > maintain the offset line number(data already processed) while > restarting/new > code push . > > Team can you please reply on this is there any configuration in Spark. > > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/how-to-maintain-the-offset-for-spark-streaming-if-HDFS-is-the-source-tp23336.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
