Hi Brock and Hari- I was just wondering if either of you had a chance to take a look at the patch and if there is anything I can do to improve it.
Thanks, Abe -- Abraham Fine | Software Engineer (516) 567-2535 BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com On Wed, Jun 11, 2014 at 6:48 PM, Brock Noland <br...@cloudera.com> wrote: > This is a great suggestion Abraham! > > > On Wed, Jun 11, 2014 at 5:39 PM, Hari Shreedharan < > hshreedha...@cloudera.com> wrote: > >> Thanks. I will review it :) >> >> >> Thanks, >> Hari >> >> On Wednesday, June 11, 2014 at 5:00 PM, Abraham Fine wrote: >> >> I went ahead and created a JIRA and patch: >> https://issues.apache.org/jira/browse/FLUME-2401 >> >> The option is configurable with: >> agentX.channels.ch1.compressBackupCheckpoint = true >> >> As per your recommendation, I used snappy-java. I also considered the >> snappy and lz4 implementations in Hadoop IO but noticed that the >> Hadoop IO dependency was removed in >> https://issues.apache.org/jira/browse/FLUME-1285 >> >> Thanks, >> Abe >> -- >> Abraham Fine | Software Engineer >> (516) 567-2535 >> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com >> >> >> On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan >> <hshreedha...@cloudera.com> wrote: >> >> Hi Abraham, >> >> Compressing the backup checkpoint is very possible. Since the backup is >> rarely read (only if the original one is corrupt on restarts), is it used. >> So I think compressing it using something like Snappy would make sense >> (GZIP >> might hit performance). Can you try using snappy-java and see if that >> gives >> good perf and reasonable compression? >> >> Patches are always welcome. I’d be glad to review and commit it. I would >> suggest making the compression optional via configuration so that anyone >> with smaller channels don’t end up using CPU for not much gain. >> >> >> Thanks, >> Hari >> >> On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote: >> >> Hello- >> >> We are using Flume 1.4 with File Channel configured to use a very >> large capacity. We keep the checkpoint and backup checkpoint on >> separate disks. >> >> Normally the file channel is mostly empty (<<1% of capacity). For the >> checkpoint the disk I/O seems to be very reasonable due to the usage >> of a MappedByteBuffer. >> >> On the other hand, the backup checkpoint seems to be written to disk >> in its entirety over and over again, resulting in very high disk >> utilization. >> >> I noticed that, because the checkpoint file is mostly empty, it is >> very compressible. I was able to GZIP our checkpoint from 381M to >> 386K. I was wondering if it would be possible to always compress the >> backup checkpoint before writing it to disk. >> >> I would be happy to work on a patch to implement this functionality if >> there is interest. >> >> Thanks in Advance, >> >> -- >> Abraham Fine | Software Engineer >> (516) 567-2535 >> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com >> >> >> >