Re: spark streaming - how to purge old data files in data directory

2016-06-18 Thread Akhil Das
Currently, there is no out of the box solution for this. Although, you can use other hdfs utils to remove older files from the directory (say 24hrs old). Another approach is discussed here

spark streaming - how to purge old data files in data directory

2016-06-18 Thread Vamsi Krishna
Hi, I'm on HDP 2.3.2 cluster (Spark 1.4.1). I have a spark streaming app which uses 'textFileStream' to stream simple CSV files and process. I see the old data files that are processed are left in the data directory. What is the right way to purge the old data files in data directory on HDFS? Tha