Hi Abraham, In general, the patch looks good. Can you add a couple of tests - * Original checkpoint is uncompressed, config changes to compress checkpoint - does the file channel restart from original checkpoint? are new checkpoints compressed? * Compressed checkpoint, config changes to not compress checkpoint - does channel start up? are new checkpoints uncompressed?
Hari On Wed, Jul 2, 2014 at 3:06 PM, Abraham Fine <a...@brightroll.com> wrote: > Hi Brock and Hari- > > I was just wondering if either of you had a chance to take a look at the > patch and if there is anything I can do to improve it. > > Thanks, > Abe > > -- > Abraham Fine | Software Engineer > (516) 567-2535 > BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com > > > On Wed, Jun 11, 2014 at 6:48 PM, Brock Noland <br...@cloudera.com> wrote: > >> This is a great suggestion Abraham! >> >> >> On Wed, Jun 11, 2014 at 5:39 PM, Hari Shreedharan < >> hshreedha...@cloudera.com> wrote: >> >>> Thanks. I will review it :) >>> >>> >>> Thanks, >>> Hari >>> >>> On Wednesday, June 11, 2014 at 5:00 PM, Abraham Fine wrote: >>> >>> I went ahead and created a JIRA and patch: >>> https://issues.apache.org/jira/browse/FLUME-2401 >>> >>> The option is configurable with: >>> agentX.channels.ch1.compressBackupCheckpoint = true >>> >>> As per your recommendation, I used snappy-java. I also considered the >>> snappy and lz4 implementations in Hadoop IO but noticed that the >>> Hadoop IO dependency was removed in >>> https://issues.apache.org/jira/browse/FLUME-1285 >>> >>> Thanks, >>> Abe >>> -- >>> Abraham Fine | Software Engineer >>> (516) 567-2535 >>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com >>> >>> >>> On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan >>> <hshreedha...@cloudera.com> wrote: >>> >>> Hi Abraham, >>> >>> Compressing the backup checkpoint is very possible. Since the backup is >>> rarely read (only if the original one is corrupt on restarts), is it >>> used. >>> So I think compressing it using something like Snappy would make sense >>> (GZIP >>> might hit performance). Can you try using snappy-java and see if that >>> gives >>> good perf and reasonable compression? >>> >>> Patches are always welcome. I’d be glad to review and commit it. I would >>> suggest making the compression optional via configuration so that anyone >>> with smaller channels don’t end up using CPU for not much gain. >>> >>> >>> Thanks, >>> Hari >>> >>> On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote: >>> >>> Hello- >>> >>> We are using Flume 1.4 with File Channel configured to use a very >>> large capacity. We keep the checkpoint and backup checkpoint on >>> separate disks. >>> >>> Normally the file channel is mostly empty (<<1% of capacity). For the >>> checkpoint the disk I/O seems to be very reasonable due to the usage >>> of a MappedByteBuffer. >>> >>> On the other hand, the backup checkpoint seems to be written to disk >>> in its entirety over and over again, resulting in very high disk >>> utilization. >>> >>> I noticed that, because the checkpoint file is mostly empty, it is >>> very compressible. I was able to GZIP our checkpoint from 381M to >>> 386K. I was wondering if it would be possible to always compress the >>> backup checkpoint before writing it to disk. >>> >>> I would be happy to work on a patch to implement this functionality if >>> there is interest. >>> >>> Thanks in Advance, >>> >>> -- >>> Abraham Fine | Software Engineer >>> (516) 567-2535 >>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com >>> >>> >>> >> >