Hi, my pig script is going to produce a set of files that will be an input for a different process. The script would be running periodically so the number of files would be growing. I would like to implement an expiry mechanism were I could remove files that are older than x or the number of files has reached some threshold.
I know a crazy way were in bash script you can call "hadoop fs -ls ...", parse the output and then execute "rmr" on matching entries. Is there a "human" way to do this from under python script? Pig.fs() doesn't come in handy because it doesn't return anything to the script but maybe I'm missing something? How could I approach that differently other than writing a java program or using shell? Python looks like a great idea but seems a bit limited at least in version 0.10.1. I appreciate any help! -- regards, Jakub Glapa
