Flume's deletion strategy is quite conservative. We do wait for 2 checkpoints after all data was removed from a file before the files are deleted. In this case, it does look like the data was actually still referenced. We had a bug sometime back that caused files to not be deleted - but that was fixed quite a while back.
Hari Thanks, Hari On Thursday, July 18, 2013 at 10:56 AM, Camp, Roy wrote: > We have noticed a few times that cleanup did not happen properly but a > restart generally forced a cleanup. > > I would recommend putting the data files back unless you did a hard delete. > Alternatively, make sure you remove (backup first) the checkpoint files if > you delete the data files. That should put flume back to a fresh state. > > Roy > > > > From: Jeremy Karlson [mailto:[email protected]] > Sent: Thursday, July 18, 2013 10:42 AM > To: [email protected] (mailto:[email protected]) > Subject: Re: Flume Data Directory Cleanup > > Thank you for your suggestion. I took a careful look at that, and I'm not > sure it describes my situation. That refers to the sink, while my problem is > with the channel. I'm looking at a dramatic accumulation of log / meta files > within the channel data directory. > > Additionally, I did try doing a manual cleanup of the channel directory, > deleting the oldest log / meta files. (This was my experiment.) Flume > really did not like that. If it is required in the channel as well, the > cutoff point at which the files go from being used to unused is not clear to > me. > > > -- Jeremy > > > > On Thu, Jul 18, 2013 at 10:13 AM, Lenin Raj <[email protected] > (mailto:[email protected])> wrote: > Hi Jeremy, > > Regarding cleanup, it was discussed already once. > > http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[email protected]%3E > > You have to do it manually. > > > > Thanks, > Lenin > > On Thu, Jul 18, 2013 at 10:36 PM, Jeremy Karlson <[email protected] > (mailto:[email protected])> wrote: > To follow up: > > > My Flume agent ran out of disk space last night and appeared to stop > processing. I shut it down and as an experiment (it's a test machine, why > not?) I deleted the oldest 10 data files, to see if Flume actually needed > these when it restarted. > > > > Flume was not happy with my choices. > > > > It spit out a lot of this: > > > > 2013-07-18 00:00:00,013 ERROR [pool-40-thread-1] o.a.f.s.AvroSource > Avro source mySource: Unable to process event batch. Exception follows. > java.lang.IllegalStateException: Channel closed [channel=myFileChannel]. Due > to java.lang.NullPointerException: null > > at > org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353) > > at > org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122) > > > ... > > Caused by: java.lang.NullPointerException > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895) > > at org.apache.flume.channel.file.Log.replay(Log.java:406) > > at > org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303) > > > ... > > > > So it seems like these files were actually in use, and not just leftover > cruft. A worthwhile thing to know, but I'd like to understand why. My > events are probably at most 1k of text, so it seems kind of odd to me that > they'd consume more than 50GB of disk space in the channel. > > > > -- Jeremy > > > > > > On Wed, Jul 17, 2013 at 3:24 PM, Jeremy Karlson <[email protected] > (mailto:[email protected])> wrote: > Hi All, > > > I have a very busy channel that has about 100,000 events queued up. My data > directory has about 50 data files, each about 1.6 GB. I don't believe my > 100k events could be consuming that much space, so I'm jumping to conclusions > and assuming that most of these files are old and due for cleanup (but I > suppose it's possible). I'm not finding much guidance in the user guide on > how often these files are cleaned up / removed / compacted / etc. > > > > Any thoughts on what's going on here, or what settings I should look for? > Thanks. > > > > -- Jeremy > > > > > > > > > > > > > > > > > >
