Re: Dynamic Key=Value Parsing with an Interceptor?

Wolfgang Hoschek Tue, 12 Nov 2013 16:46:08 -0800

Consider if the splitKeyValue command is applicable here, perhaps in 
combination with readLine, split and grok.


Example is here: 
http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/splitKeyValue

Wolfgang.

On Nov 12, 2013, at 3:18 PM, Matt Wise wrote:

> Paul,
>   Thanks for the feedback. I looked briefly at Morphline, but wasn't sure if 
> it was what I needed. I will take a deeper dive this week and see if it will 
> do what we want. Ultimately the reason we're not changing the apps is that we 
> honestly don't always have a lot of control. Many of the apps are 3rd party 
> apps where we just barely have the ability to adjust their log-line-formats.
> 
> Matt Wise
> Sr. Systems Architect
> Nextdoor.com
> 
> 
> On Mon, Nov 11, 2013 at 3:09 PM, Paul Chavez 
> <pcha...@verticalsearchworks.com> wrote:
> I think there may be two ‘out of box’ ways to do this kind of thing. First 
> would be using the regex extract interceptor with multiple serializers keying 
> on various fields. However that’s not really dynamic and just kind of a 
> half-step better from one interceptor for each field as you mentioned. Second 
> would be to use the morphline interceptor to parse your event body and insert 
> headers as needed. I have to admit I have no experience with this interceptor 
> but in reading the documentation it seems designed for this kind of use case.
> 
>  
> 
> Ultimately though, when faced with this we opted to push this into the app 
> layer. Is there a reason the applications can’t write these key/value pairs 
> as headers in the first place? We use an HTTP source and when we wrote the 
> logging class for it on our app side we put similar functionality in as 
> category/subcategory headers. Then flume doesn’t have to have any special 
> interceptors beyond a default static one in case the headers are completely 
> missing, and we write to HDFS with tokenized paths so each permutation of 
> those headers gets a separate directory.
> 
>  
> 
> If you continue to explore this issue, please keep us updated. I especially 
> would like to hear some real world morphline examples.
> 
>  
> 
> Hope that helps,
> 
> Paul Chavez
> 
>  
> 
>  
> 
> From: Matt Wise [mailto:m...@nextdoor.com] 
> Sent: Monday, November 11, 2013 10:04 AM
> To: user@flume.apache.org
> Subject: Re: Dynamic Key=Value Parsing with an Interceptor?
> 
>  
> 
> Anyone have any ideas on the best way to do this?
> 
> 
> 
> Matt Wise
> 
> Sr. Systems Architect
> 
> Nextdoor.com
> 
>  
> 
> On Sat, Nov 9, 2013 at 5:28 PM, Matt Wise <m...@nextdoor.com> wrote:
> 
> Hey we'd like to set up a default format for all of our logging systems... 
> perhaps looking like this:
> 
>  
> 
>   "key1=value1;key2=value2;key3=value3...."
> 
>  
> 
> With this pattern, we'd allow developers to define any key/value pairs they 
> want to log, and separate them with a common separator.
> 
>  
> 
> If we did this, what do we need to do in Flume to get Flume to parse out the 
> key=value pairs into dynamic headers? We pass our data from Flume into both 
> HDFS and ElasticSearch sinks. We would really like to have these fields 
> dynamically sent to the sinks for much easier parsing and analysis later.
> 
>  
> 
> Any thoughts on this? I know that we can define a unique interceptor for each 
> service that looks for explicit field names ... but thats a nightmare to 
> manage. I really want something truly dynamic.
> 
> 
> 
> Matt Wise
> 
> Sr. Systems Architect
> 
> Nextdoor.com
> 
>  
> 
>

Re: Dynamic Key=Value Parsing with an Interceptor?

Reply via email to