Hi Abraham,  

Compressing the backup checkpoint is very possible. Since the backup is rarely 
read (only if the original one is corrupt on restarts), is it used. So I think 
compressing it using something like Snappy would make sense (GZIP might hit 
performance). Can you try using snappy-java and see if that gives good perf and 
reasonable compression?

Patches are always welcome. I’d be glad to review and commit it. I would 
suggest making the compression optional via configuration so that anyone with 
smaller channels don’t end up using CPU for not much gain.  


Thanks,
Hari


On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote:

> Hello-
>  
> We are using Flume 1.4 with File Channel configured to use a very
> large capacity. We keep the checkpoint and backup checkpoint on
> separate disks.
>  
> Normally the file channel is mostly empty (<<1% of capacity). For the
> checkpoint the disk I/O seems to be very reasonable due to the usage
> of a MappedByteBuffer.
>  
> On the other hand, the backup checkpoint seems to be written to disk
> in its entirety over and over again, resulting in very high disk
> utilization.
>  
> I noticed that, because the checkpoint file is mostly empty, it is
> very compressible. I was able to GZIP our checkpoint from 381M to
> 386K. I was wondering if it would be possible to always compress the
> backup checkpoint before writing it to disk.
>  
> I would be happy to work on a patch to implement this functionality if
> there is interest.
>  
> Thanks in Advance,
>  
> --  
> Abraham Fine | Software Engineer
> (516) 567-2535
> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com 
> (http://www.brightroll.com)
>  
>  


Reply via email to