I see in the documentation for org.apache.flume.interceptor.Interceptor
that the result of intercept(List<Event>) must not exceed the size of the
input (in all-caps, even). This is unfortunate for my use-case: I'm
interfacing with a scribe source that provides each message as a
serialization of some number of protobuf records together with their
content-lengths, and an interceptor seemed like an ideal way to convert
those possibly-many records into individual events. That's particularly
desirable because I need to establish the timestamp header from each
underlying record in order to route to the correct file in HDFS. It's
unlikely that a batch of records coming in as a single event have
_drastically_ different timestamps, but it's also out of my control.

Given all the capital letters, the restriction on output cardinality is
really-real, right? I'll be setting myself up for disaster?

Is there some other way I can convert an event that looks essentially like

Event(rec-size-1 + rec-1 + rec-size-2 + rec-2 + ... + rec-size-N + rec-N)


into a List<Event>:

{Event(rec-1), Event(rec-2), ..., Event(rec-N)}


This channel has nontrivial volume, potentially hundreds of MB per minute,
so I don't want to (e.g.) serialize the multiple records and then read them
into a second stage if I can handle the one-to-many transformation on the
fly.

Thanks in advance for clarifications and suggestions.

-mt

Reply via email to