If you're up for a fancy but excellent solution: - Store your data in Cassandra. - Use the expiring data feature (TTL) <https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html> so data will automatically be removed a month later. - Now in your Spark process, just read from the database and you don't have to worry about the timestamp. - You'll still have all your old files if you need to refer back them.
Pete On Tue, Sep 27, 2016 at 2:52 AM, Divya Gehlot <divya.htco...@gmail.com> wrote: > Hi, > The input data files for my spark job generated at every five minutes file > name follows epoch time convention as below : > > InputFolder/batch-1474959600000 > InputFolder/batch-1474959900000 > InputFolder/batch-1474960200000 > InputFolder/batch-1474960500000 > InputFolder/batch-1474960800000 > InputFolder/batch-1474961100000 > InputFolder/batch-1474961400000 > InputFolder/batch-1474961700000 > InputFolder/batch-1474962000000 > InputFolder/batch-1474962300000 > > As per requirement I need to read one month of data from current timestamp. > > Would really appreciate if anybody could help me . > > Thanks, > Divya >