Re: one-to-many interceptor

Mike Percy Tue, 24 Jun 2014 23:27:08 -0700

Matt, no prob!

Mike


Sent from my iPhone

> On Jun 24, 2014, at 8:35 PM, Matt Tenenbaum <matt.tenenb...@rockyou.com> 
> wrote:
> 
> Thanks Mike. That's very helpful.
> 
> It should be no trouble to establish an upper bound on the intra-event 
> packing, so I should be able to tune the capacity parameters in the way you 
> suggest. Besides, aren't the sharp edges what make it fun to play with?
> 
> I may be able to impose suitable restrictions upstream that guarantee a 
> one-to-one correspondence between protobuf records and scribe messages, thus 
> allowing me to follow your recommendation and avoid these shenanigans. But 
> it's nice to know that I _can_ do it this way if I need to.
> 
> Cheers
> -mt
> 
> 
>> On Tue, Jun 24, 2014 at 7:45 PM, Mike Percy <mpe...@apache.org> wrote:
>> Hi Matt,
>> If you can guarantee there are a certain # of events in a single "wrapper" 
>> event, or bound the limit, then you could potentially get away with this. 
>> However if you're not careful you could get stuck in an infinite 
>> fail-backoff-retry loop due to exceeding the (configurable) channel 
>> transaction limit. The first limit you will want to tune is the 
>> channel.transactionCapacity parameter, which is simply a sanity-check / 
>> arbitrary limit on the # of events that can be placed into a channel in a 
>> single transaction (this avoids weird bugs like a source that opens a 
>> transaction that never gets committed). The other thing you have got to 
>> watch out for is what your (Flume) batch size looks like, since Flume is 
>> designed to do batching at the RPC layer, not at the event layer like you 
>> are describing.
>> 
>> So basically just make sure that your channel.transactionCapacity > max 
>> batch size * max # sub-events per "wrapper" event.
>> 
>> Hope this makes sense. The above explanation is somewhat subtle and since it 
>> has sharp edges when misconfigured, we just recommend not to do it if 
>> possible.
>> 
>> Best,
>> Mike
>> 
>> 
>>> On Tue, Jun 24, 2014 at 10:17 AM, Matt Tenenbaum 
>>> <matt.tenenb...@rockyou.com> wrote:
>>> I see in the documentation for org.apache.flume.interceptor.Interceptor 
>>> that the result of intercept(List<Event>) must not exceed the size of the 
>>> input (in all-caps, even). This is unfortunate for my use-case: I'm 
>>> interfacing with a scribe source that provides each message as a 
>>> serialization of some number of protobuf records together with their 
>>> content-lengths, and an interceptor seemed like an ideal way to convert 
>>> those possibly-many records into individual events. That's particularly 
>>> desirable because I need to establish the timestamp header from each 
>>> underlying record in order to route to the correct file in HDFS. It's 
>>> unlikely that a batch of records coming in as a single event have 
>>> _drastically_ different timestamps, but it's also out of my control.
>>> 
>>> Given all the capital letters, the restriction on output cardinality is 
>>> really-real, right? I'll be setting myself up for disaster?
>>> 
>>> Is there some other way I can convert an event that looks essentially like
>>> 
>>> Event(rec-size-1 + rec-1 + rec-size-2 + rec-2 + ... + rec-size-N + rec-N)
>>> 
>>> into a List<Event>:
>>> 
>>> {Event(rec-1), Event(rec-2), ..., Event(rec-N)}
>>> 
>>> This channel has nontrivial volume, potentially hundreds of MB per minute, 
>>> so I don't want to (e.g.) serialize the multiple records and then read them 
>>> into a second stage if I can handle the one-to-many transformation on the 
>>> fly.
>>> 
>>> Thanks in advance for clarifications and suggestions.
>>> 
>>> -mt
>

Re: one-to-many interceptor

Reply via email to