I went ahead and created a JIRA and patch:
https://issues.apache.org/jira/browse/FLUME-2401

The option is configurable with:
agentX.channels.ch1.compressBackupCheckpoint = true

As per your recommendation, I used snappy-java. I also considered the
snappy and lz4 implementations in Hadoop IO but noticed that the
Hadoop IO dependency was removed in
https://issues.apache.org/jira/browse/FLUME-1285

Thanks,
Abe
-- 
Abraham Fine | Software Engineer
(516) 567-2535
BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com


On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan
<hshreedha...@cloudera.com> wrote:
> Hi Abraham,
>
> Compressing the backup checkpoint is very possible. Since the backup is
> rarely read (only if the original one is corrupt on restarts), is it used.
> So I think compressing it using something like Snappy would make sense (GZIP
> might hit performance). Can you try using snappy-java and see if that gives
> good perf and reasonable compression?
>
> Patches are always welcome. I’d be glad to review and commit it. I would
> suggest making the compression optional via configuration so that anyone
> with smaller channels don’t end up using CPU for not much gain.
>
>
> Thanks,
> Hari
>
> On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote:
>
> Hello-
>
> We are using Flume 1.4 with File Channel configured to use a very
> large capacity. We keep the checkpoint and backup checkpoint on
> separate disks.
>
> Normally the file channel is mostly empty (<<1% of capacity). For the
> checkpoint the disk I/O seems to be very reasonable due to the usage
> of a MappedByteBuffer.
>
> On the other hand, the backup checkpoint seems to be written to disk
> in its entirety over and over again, resulting in very high disk
> utilization.
>
> I noticed that, because the checkpoint file is mostly empty, it is
> very compressible. I was able to GZIP our checkpoint from 381M to
> 386K. I was wondering if it would be possible to always compress the
> backup checkpoint before writing it to disk.
>
> I would be happy to work on a patch to implement this functionality if
> there is interest.
>
> Thanks in Advance,
>
> --
> Abraham Fine | Software Engineer
> (516) 567-2535
> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com
>
>

Reply via email to