If you're up for a fancy but excellent solution:

   - Store your data in Cassandra.
   - Use the expiring data feature (TTL)
   <https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html> so
   data will automatically be removed a month later.
   - Now in your Spark process, just read from the database and you don't
   have to worry about the timestamp.
   - You'll still have all your old files if you need to refer back them.

Pete

On Tue, Sep 27, 2016 at 2:52 AM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> Hi,
> The input data files for my spark job generated at every five minutes file
> name follows epoch time convention  as below :
>
> InputFolder/batch-1474959600000
> InputFolder/batch-1474959900000
> InputFolder/batch-1474960200000
> InputFolder/batch-1474960500000
> InputFolder/batch-1474960800000
> InputFolder/batch-1474961100000
> InputFolder/batch-1474961400000
> InputFolder/batch-1474961700000
> InputFolder/batch-1474962000000
> InputFolder/batch-1474962300000
>
> As per requirement I need to read one month of data from current timestamp.
>
> Would really appreciate if anybody could help me .
>
> Thanks,
> Divya
>

Reply via email to