Currently, there is no out of the box solution for this. Although, you can use other hdfs utils to remove older files from the directory (say 24hrs old). Another approach is discussed here <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tracking-deleting-processed-files-td21444.html> .
On Sun, Jun 19, 2016 at 7:28 AM, Vamsi Krishna <vamsi.attl...@gmail.com> wrote: > Hi, > > I'm on HDP 2.3.2 cluster (Spark 1.4.1). > I have a spark streaming app which uses 'textFileStream' to stream simple > CSV files and process. > I see the old data files that are processed are left in the data directory. > What is the right way to purge the old data files in data directory on > HDFS? > > Thanks, > Vamsi Attluri > -- > Vamsi Attluri > -- Cheers!