Flume's deletion strategy is quite conservative. We do wait for 2 checkpoints 
after all data was removed from a file before the files are deleted. In this 
case, it does look like the data was actually still referenced. We had a bug 
sometime back that caused files to not be deleted - but that was fixed quite a 
while back.


Hari 


Thanks,
Hari


On Thursday, July 18, 2013 at 10:56 AM, Camp, Roy wrote:

> We have noticed a few times that cleanup did not happen properly but a 
> restart generally forced a cleanup.  
>  
> I would recommend putting the data files back unless you did a hard delete.  
> Alternatively, make sure you remove (backup first) the checkpoint files if 
> you delete the data files.  That should put flume back to a fresh state. 
>  
> Roy
>  
>  
>  
> From: Jeremy Karlson [mailto:[email protected]] 
> Sent: Thursday, July 18, 2013 10:42 AM
> To: [email protected] (mailto:[email protected])
> Subject: Re: Flume Data Directory Cleanup 
>  
> Thank you for your suggestion.  I took a careful look at that, and I'm not 
> sure it describes my situation.  That refers to the sink, while my problem is 
> with the channel.  I'm looking at a dramatic accumulation of log / meta files 
> within the channel data directory.
> 
> Additionally, I did try doing a manual cleanup of the channel directory, 
> deleting the oldest log / meta files.  (This was my experiment.)  Flume 
> really did not like that.  If it is required in the channel as well, the 
> cutoff point at which the files go from being used to unused is not clear to 
> me. 
>  
> 
> -- Jeremy
> 
> 
>  
> On Thu, Jul 18, 2013 at 10:13 AM, Lenin Raj <[email protected] 
> (mailto:[email protected])> wrote:
> Hi Jeremy,
> 
> Regarding cleanup, it was discussed already once.
> 
> http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[email protected]%3E
>  
> You have to do it manually.
> 
> 
> 
> Thanks,
> Lenin 
>  
> On Thu, Jul 18, 2013 at 10:36 PM, Jeremy Karlson <[email protected] 
> (mailto:[email protected])> wrote:
> To follow up:
>  
> 
> My Flume agent ran out of disk space last night and appeared to stop 
> processing.  I shut it down and as an experiment (it's a test machine, why 
> not?) I deleted the oldest 10 data files, to see if Flume actually needed 
> these when it restarted.
> 
>  
> 
> Flume was not happy with my choices.
> 
>  
> 
> It spit out a lot of this:
> 
>  
> 
> 2013-07-18 00:00:00,013 ERROR [pool-40-thread-1]        o.a.f.s.AvroSource 
> Avro source mySource: Unable to process event batch. Exception follows. 
> java.lang.IllegalStateException: Channel closed [channel=myFileChannel]. Due 
> to java.lang.NullPointerException: null
> 
>         at 
> org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
> 
>         at 
> org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
> 
> 
>         ...
> 
> Caused by: java.lang.NullPointerException
> 
>         at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
> 
>         at org.apache.flume.channel.file.Log.replay(Log.java:406)
> 
>         at 
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
> 
> 
>         ...
> 
>  
> 
> So it seems like these files were actually in use, and not just leftover 
> cruft.  A worthwhile thing to know, but I'd like to understand why.  My 
> events are probably at most 1k of text, so it seems kind of odd to me that 
> they'd consume more than 50GB of disk space in the channel.
> 
>  
> 
> -- Jeremy
> 
>  
> 
> 
>  
> On Wed, Jul 17, 2013 at 3:24 PM, Jeremy Karlson <[email protected] 
> (mailto:[email protected])> wrote:
> Hi All,
>  
> 
> I have a very busy channel that has about 100,000 events queued up.  My data 
> directory has about 50 data files, each about 1.6 GB.  I don't believe my 
> 100k events could be consuming that much space, so I'm jumping to conclusions 
> and assuming that most of these files are old and due for cleanup (but I 
> suppose it's possible).  I'm not finding much guidance in the user guide on 
> how often these files are cleaned up / removed / compacted / etc.
> 
>  
> 
> Any thoughts on what's going on here, or what settings I should look for?  
> Thanks.
> 
>  
> 
> -- Jeremy
> 
> 
> 
>  
> 
> 
> 
> 
>  
> 
> 
> 
> 
>  
> 
> 
> 
> 


Reply via email to