Hi, I'm having the same problem with HDFS sink.
A 'poison' message which doesn't have timestamp header in it as the sink expects. This causes a NPE which ends in returning the message to the channel , over and over again. Is my only option to re-write the HDFS sink? Isn't there any way to intercept in the sink work? Thanks Anat On Fri, Jul 26, 2013 at 3:35 AM, Arvind Prabhakar <[email protected]> wrote: > Sounds like a bug in ElasticSearch sink to me. Do you mind filing a Jira > to track this? Sample data to cause this would be even better. > > Regards, > Arvind Prabhakar > > > On Thu, Jul 25, 2013 at 9:50 AM, Jeremy Karlson > <[email protected]>wrote: > >> This was using the provided ElasticSearch sink. The logs were not >> helpful. I ran it through with the debugger and found the source of the >> problem. >> >> ContentBuilderUtil uses a very "aggressive" method to determine if the >> content is JSON; if it contains a "{" anywhere in it, it's considered JSON. >> My body contained that but wasn't JSON, causing the JSON parser to throw a >> CharConversionException from addComplexField(...) (but not the expected >> JSONException). We've changed addComplexField(...) to catch different >> types of exceptions and fall back to treating it as a simple field. We'll >> probably submit a patch for this soon. >> >> I'm reasonably happy with this, but I still think that in the bigger >> picture there should be some sort of mechanism to automatically detect and >> toss / skip / flag problematic events without them plugging up the flow. >> >> -- Jeremy >> >> >> On Wed, Jul 24, 2013 at 7:51 PM, Arvind Prabhakar <[email protected]>wrote: >> >>> Jeremy, would it be possible for you to show us logs for the part where >>> the sink fails to remove an event from the channel? I am assuming this is a >>> standard sink that Flume provides and not a custom one. >>> >>> The reason I ask is because sinks do not introspect the event, and hence >>> there is no reason why it will fail during the event's removal. It is more >>> likely that there is a problem within the channel in that it cannot >>> dereference the event correctly. Looking at the logs will help us identify >>> the root cause for what you are experiencing. >>> >>> Regards, >>> Arvind Prabhakar >>> >>> >>> On Wed, Jul 24, 2013 at 3:56 PM, Jeremy Karlson <[email protected] >>> > wrote: >>> >>>> Both reasonable suggestions. What would a custom sink look like in >>>> this case, and how would I only eliminate the problem events since I don't >>>> know what they are until they are attempted by the "real" sink? >>>> >>>> My philosophical concern (in general) is that we're taking the approach >>>> of exhaustively finding and eliminating possible failure cases. It's not >>>> possible to eliminate every single failure case, so shouldn't there be a >>>> method of last resort to eliminate problem events from the channel? >>>> >>>> -- Jeremy >>>> >>>> >>>> >>>> On Wed, Jul 24, 2013 at 3:45 PM, Hari Shreedharan < >>>> [email protected]> wrote: >>>> >>>>> Or you could write a custom sink that removes this event (more work of >>>>> course) >>>>> >>>>> >>>>> Thanks, >>>>> Hari >>>>> >>>>> On Wednesday, July 24, 2013 at 3:36 PM, Roshan Naik wrote: >>>>> >>>>> if you have a way to identify such events.. you may be able to use the >>>>> Regex interceptor to toss them out before they get into the channel. >>>>> >>>>> >>>>> On Wed, Jul 24, 2013 at 2:52 PM, Jeremy Karlson < >>>>> [email protected]> wrote: >>>>> >>>>> Hi everyone. My Flume adventures continue. >>>>> >>>>> I'm in a situation now where I have a channel that's filling because a >>>>> stubborn message is stuck. The sink won't accept it (for whatever reason; >>>>> I can go into detail but that's not my point here). This just blocks up >>>>> the channel entirely, because it goes back into the channel when the sink >>>>> refuses. Obviously, this isn't ideal. >>>>> >>>>> I'm wondering what mechanisms, if any, Flume has to deal with these >>>>> situations. Things that come to mind might be: >>>>> >>>>> 1. Ditch the event after n attempts. >>>>> 2. After n attempts, send the event to a "problem area" (maybe a >>>>> different source / sink / channel?) that someone can look at later. >>>>> 3. Some sort of mechanism that allows operators to manually kill these >>>>> messages. >>>>> >>>>> I'm open to suggestions on alternatives as well. >>>>> >>>>> Thanks. >>>>> >>>>> -- Jeremy >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >
