Consider if the splitKeyValue command is applicable here, perhaps in combination with readLine, split and grok.
Example is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/splitKeyValue Wolfgang. On Nov 12, 2013, at 3:18 PM, Matt Wise wrote: > Paul, > Thanks for the feedback. I looked briefly at Morphline, but wasn't sure if > it was what I needed. I will take a deeper dive this week and see if it will > do what we want. Ultimately the reason we're not changing the apps is that we > honestly don't always have a lot of control. Many of the apps are 3rd party > apps where we just barely have the ability to adjust their log-line-formats. > > Matt Wise > Sr. Systems Architect > Nextdoor.com > > > On Mon, Nov 11, 2013 at 3:09 PM, Paul Chavez > <pcha...@verticalsearchworks.com> wrote: > I think there may be two ‘out of box’ ways to do this kind of thing. First > would be using the regex extract interceptor with multiple serializers keying > on various fields. However that’s not really dynamic and just kind of a > half-step better from one interceptor for each field as you mentioned. Second > would be to use the morphline interceptor to parse your event body and insert > headers as needed. I have to admit I have no experience with this interceptor > but in reading the documentation it seems designed for this kind of use case. > > > > Ultimately though, when faced with this we opted to push this into the app > layer. Is there a reason the applications can’t write these key/value pairs > as headers in the first place? We use an HTTP source and when we wrote the > logging class for it on our app side we put similar functionality in as > category/subcategory headers. Then flume doesn’t have to have any special > interceptors beyond a default static one in case the headers are completely > missing, and we write to HDFS with tokenized paths so each permutation of > those headers gets a separate directory. > > > > If you continue to explore this issue, please keep us updated. I especially > would like to hear some real world morphline examples. > > > > Hope that helps, > > Paul Chavez > > > > > > From: Matt Wise [mailto:m...@nextdoor.com] > Sent: Monday, November 11, 2013 10:04 AM > To: user@flume.apache.org > Subject: Re: Dynamic Key=Value Parsing with an Interceptor? > > > > Anyone have any ideas on the best way to do this? > > > > Matt Wise > > Sr. Systems Architect > > Nextdoor.com > > > > On Sat, Nov 9, 2013 at 5:28 PM, Matt Wise <m...@nextdoor.com> wrote: > > Hey we'd like to set up a default format for all of our logging systems... > perhaps looking like this: > > > > "key1=value1;key2=value2;key3=value3...." > > > > With this pattern, we'd allow developers to define any key/value pairs they > want to log, and separate them with a common separator. > > > > If we did this, what do we need to do in Flume to get Flume to parse out the > key=value pairs into dynamic headers? We pass our data from Flume into both > HDFS and ElasticSearch sinks. We would really like to have these fields > dynamically sent to the sinks for much easier parsing and analysis later. > > > > Any thoughts on this? I know that we can define a unique interceptor for each > service that looks for explicit field names ... but thats a nightmare to > manage. I really want something truly dynamic. > > > > Matt Wise > > Sr. Systems Architect > > Nextdoor.com > > > >