Correct - that would be pretty tricky. We could indeed modify the tool to take a custom function to process each event - that would work. We must specify an interface that the user must implement, say, FileChannelDataVerifier or something. We then call this on each event.
Do you want to take a stab at it? Thanks, Hari On Wed, Feb 4, 2015 at 7:27 PM, Ashish <paliwalash...@gmail.com> wrote: > I think I was not clear. I was taking about an offline tool which > would help us clean the Channel with valid Flume Event, with > invalid/malformed payload. What you described would work in Agent > flow. Given the current scenario, Charles would be keen to push the > backed up events to destination. > Here is what I was taking about > offline tool -> read event from file channel -> event not corrupt -> > decodePayload (user supplied) -> if valid write to file channel else > drop the event > As far as my reading goes, everything on file channel is stored a > TransactionEvent, so this might not be as simple as it looks. In > another words, it would be like a FileChannelSink with user provided > function to validate payload. > On Thu, Feb 5, 2015 at 1:02 AM, Hari Shreedharan > <hshreedha...@cloudera.com> wrote: >> It is not as easy, since the channel does not know what an event looks like >> or why the transaction is being rolled back. That is something that is being >> handled by the serializer and sink. We need to some how remove an event from >> the channel if it the channel is broken. We might perhaps have to add a new >> interface that allows the sink/serializer to tell the channel “forget”a >> specific event, and that event will be dropped from the channel and >> transaction. >> >> I don’t see how we can do it outside the sink for this reason. >> >> Thanks, >> Hari >> >> >> On Wed, Feb 4, 2015 at 5:32 AM, Ashish <paliwalash...@gmail.com> wrote: >>> >>> Is it possible to extend File Channel Integrity tool to support >>> filtering out corrupt events? Something like once we get a record and >>> it's not corrupt, provide a placeholder function to validate Event >>> implemented by user. I still don't have too much insight into >>> FileChannel implementation. >>> >>> On Tue, Feb 3, 2015 at 12:49 AM, Hari Shreedharan >>> <hshreedha...@cloudera.com> wrote: >>> > Currently, no - there is no such tool, but this is a request that has >>> > come >>> > up time and again. Can you file a jira for this? If someone has time, >>> > they’d >>> > probably pick it up >>> > >>> > Thanks, >>> > Hari >>> > >>> > >>> > On Mon, Feb 2, 2015 at 10:52 AM, Charles McLaughlin >>> > <char...@nextdoor.com> >>> > wrote: >>> >> >>> >> Hello, >>> >> >>> >> We had a situation where one of our Flume agents got stuck on a message >>> >> due to unexpected format. To get things moving again, I stopped the >>> >> Flume >>> >> agent, moved the file backed channel data out of the way and re-started >>> >> the >>> >> Flume agent. I'd like to pop the bad message from the queue data on >>> >> disk... >>> >> are there any tools or recommended ways to do this? >>> >> >>> >> Thanks, >>> >> Charles >>> > >>> > >>> >>> >>> >>> -- >>> thanks >>> ashish >>> >>> Blog: http://www.ashishpaliwal.com/blog >>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >> >> > -- > thanks > ashish > Blog: http://www.ashishpaliwal.com/blog > My Photo Galleries: http://www.pbase.com/ashishpaliwal