Hi,
we want to monitor hdfs (or local) directory, read csv files that appear
and after successful processing - delete them (mainly not to run out of
disk space...)
I'm not quite sure how to achieve it with current implementation.
Previously, when we read binary data (unsplittable files) we made small
hack and deleted them
in our FileInputFormat - but now we want to use splits and detecting
which split is 'the last one' is no longer so obvious - of course it's
also problematic when it comes to checkpointing...
So my question is - is there a idiomatic way of deleting processed files?
thanks,
maciek