Hari- I added the new tests and created a new revision to my patch.
https://issues.apache.org/jira/secure/attachment/12653728/compress_backup_checkpoint_new_tests.patch Thanks, Abe -- Abraham Fine | Software Engineer (516) 567-2535 BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com On Wed, Jul 2, 2014 at 4:32 PM, Hari Shreedharan <hshreedha...@cloudera.com> wrote: > Hi Abraham, > > In general, the patch looks good. Can you add a couple of tests - > * Original checkpoint is uncompressed, config changes to compress > checkpoint - does the file channel restart from original checkpoint? are > new checkpoints compressed? > * Compressed checkpoint, config changes to not compress checkpoint - does > channel start up? are new checkpoints uncompressed? > > > Hari > > > On Wed, Jul 2, 2014 at 3:06 PM, Abraham Fine <a...@brightroll.com> wrote: > >> Hi Brock and Hari- >> >> I was just wondering if either of you had a chance to take a look at the >> patch and if there is anything I can do to improve it. >> >> Thanks, >> Abe >> >> -- >> Abraham Fine | Software Engineer >> (516) 567-2535 >> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com >> >> >> On Wed, Jun 11, 2014 at 6:48 PM, Brock Noland <br...@cloudera.com> wrote: >> >>> This is a great suggestion Abraham! >>> >>> >>> On Wed, Jun 11, 2014 at 5:39 PM, Hari Shreedharan < >>> hshreedha...@cloudera.com> wrote: >>> >>>> Thanks. I will review it :) >>>> >>>> >>>> Thanks, >>>> Hari >>>> >>>> On Wednesday, June 11, 2014 at 5:00 PM, Abraham Fine wrote: >>>> >>>> I went ahead and created a JIRA and patch: >>>> https://issues.apache.org/jira/browse/FLUME-2401 >>>> >>>> The option is configurable with: >>>> agentX.channels.ch1.compressBackupCheckpoint = true >>>> >>>> As per your recommendation, I used snappy-java. I also considered the >>>> snappy and lz4 implementations in Hadoop IO but noticed that the >>>> Hadoop IO dependency was removed in >>>> https://issues.apache.org/jira/browse/FLUME-1285 >>>> >>>> Thanks, >>>> Abe >>>> -- >>>> Abraham Fine | Software Engineer >>>> (516) 567-2535 >>>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com >>>> >>>> >>>> On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan >>>> <hshreedha...@cloudera.com> wrote: >>>> >>>> Hi Abraham, >>>> >>>> Compressing the backup checkpoint is very possible. Since the backup is >>>> rarely read (only if the original one is corrupt on restarts), is it >>>> used. >>>> So I think compressing it using something like Snappy would make sense >>>> (GZIP >>>> might hit performance). Can you try using snappy-java and see if that >>>> gives >>>> good perf and reasonable compression? >>>> >>>> Patches are always welcome. I’d be glad to review and commit it. I would >>>> suggest making the compression optional via configuration so that anyone >>>> with smaller channels don’t end up using CPU for not much gain. >>>> >>>> >>>> Thanks, >>>> Hari >>>> >>>> On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote: >>>> >>>> Hello- >>>> >>>> We are using Flume 1.4 with File Channel configured to use a very >>>> large capacity. We keep the checkpoint and backup checkpoint on >>>> separate disks. >>>> >>>> Normally the file channel is mostly empty (<<1% of capacity). For the >>>> checkpoint the disk I/O seems to be very reasonable due to the usage >>>> of a MappedByteBuffer. >>>> >>>> On the other hand, the backup checkpoint seems to be written to disk >>>> in its entirety over and over again, resulting in very high disk >>>> utilization. >>>> >>>> I noticed that, because the checkpoint file is mostly empty, it is >>>> very compressible. I was able to GZIP our checkpoint from 381M to >>>> 386K. I was wondering if it would be possible to always compress the >>>> backup checkpoint before writing it to disk. >>>> >>>> I would be happy to work on a patch to implement this functionality if >>>> there is interest. >>>> >>>> Thanks in Advance, >>>> >>>> -- >>>> Abraham Fine | Software Engineer >>>> (516) 567-2535 >>>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com >>>> >>>> >>>> >>> >> >