I think there may be two 'out of box' ways to do this kind of thing. First 
would be using the regex extract interceptor with multiple serializers keying 
on various fields. However that's not really dynamic and just kind of a 
half-step better from one interceptor for each field as you mentioned. Second 
would be to use the morphline interceptor to parse your event body and insert 
headers as needed. I have to admit I have no experience with this interceptor 
but in reading the documentation it seems designed for this kind of use case.

Ultimately though, when faced with this we opted to push this into the app 
layer. Is there a reason the applications can't write these key/value pairs as 
headers in the first place? We use an HTTP source and when we wrote the logging 
class for it on our app side we put similar functionality in as 
category/subcategory headers. Then flume doesn't have to have any special 
interceptors beyond a default static one in case the headers are completely 
missing, and we write to HDFS with tokenized paths so each permutation of those 
headers gets a separate directory.

If you continue to explore this issue, please keep us updated. I especially 
would like to hear some real world morphline examples.

Hope that helps,
Paul Chavez


From: Matt Wise [mailto:m...@nextdoor.com]
Sent: Monday, November 11, 2013 10:04 AM
To: user@flume.apache.org
Subject: Re: Dynamic Key=Value Parsing with an Interceptor?

Anyone have any ideas on the best way to do this?

Matt Wise
Sr. Systems Architect
Nextdoor.com

On Sat, Nov 9, 2013 at 5:28 PM, Matt Wise 
<m...@nextdoor.com<mailto:m...@nextdoor.com>> wrote:
Hey we'd like to set up a default format for all of our logging systems... 
perhaps looking like this:

  "key1=value1;key2=value2;key3=value3...."

With this pattern, we'd allow developers to define any key/value pairs they 
want to log, and separate them with a common separator.

If we did this, what do we need to do in Flume to get Flume to parse out the 
key=value pairs into dynamic headers? We pass our data from Flume into both 
HDFS and ElasticSearch sinks. We would really like to have these fields 
dynamically sent to the sinks for much easier parsing and analysis later.

Any thoughts on this? I know that we can define a unique interceptor for each 
service that looks for explicit field names ... but thats a nightmare to 
manage. I really want something truly dynamic.

Matt Wise
Sr. Systems Architect
Nextdoor.com

Reply via email to