that looks promising, thanks Clement!
-- regards, pozdrawiam, Jakub Glapa On Fri, Jan 18, 2013 at 9:12 AM, Clément MATHIEU <[email protected]>wrote: > On 2013-01-17 23:11, Jakub Glapa wrote: > > Hi Jakub, > > > my pig script is going to produce a set of files that will be an input for >> a different process. The script would be running periodically so the >> number >> of files would be growing. >> I would like to implement an expiry mechanism were I could remove files >> that are older than x or the number of files has reached some threshold. >> >> I know a crazy way were in bash script you can call "hadoop fs -ls ...", >> parse the output and then execute "rmr" on matching entries. >> >> Is there a "human" way to do this from under python script? Pig.fs() >> > > I had the same issue than you few months ago. The public Pig scripting API > only exposes a FsShell object which is way too limited to do any real work. > However it is possible to get access to the Hadoop FileSystem API from a > Python script: > > > def get_fs(): > """Return a org.apache.hadoop.fs.**FileSystem instance.""" > # Pig scripting API exports a FsShell but not a FileSystem object. > ctx = ScriptPigContext.get() > props = ctx.getPigContext().**getProperties() > conf = ConfigurationUtil.**toConfiguration(props) > fs = FileSystem.get(conf) > return fs > > > Once you have a FileSystem object you can do whatever you want using the > standard Hadoop API. > > > Hope this helps. > > -- Clément >
