Re: Flume workflow design

2013-07-18 Thread Wolfgang Hoschek
Take a look at these options: - HBase Sinks (send data into HBase): http://flume.apache.org/FlumeUserGuide.html#hbasesinks - Apache Flume Morphline Solr Sink (for heavy duty ETL processing and ingestion into Solr): http://flume.apache.org/FlumeUserGuide.html#morphlinesolrsink

Re: Flume workflow design

2013-07-19 Thread Wolfgang Hoschek
> Best, > Flavio > > On Fri, Jul 19, 2013 at 12:51 AM, Wolfgang Hoschek > wrote: > Take a look at these options: > > - HBase Sinks (send data into HBase): > > http://flume.apache.org/FlumeUserGuide.html#hbasesinks > > - Apache Flume Morphline Solr Sink (for

Re: Download and Configure MorphlineSolrSink

2013-07-19 Thread Wolfgang Hoschek
The Morphline Solr Sink ships as part of Apache Flume 1.4.0: http://flume.apache.org/download.html Documentation is here: http://flume.apache.org/FlumeUserGuide.html#morphlinesolrsink Basically, you configure it like any other Flume Sink, plus point it to a morphline config fi

Re: SolrCell help!

2013-07-22 Thread Wolfgang Hoschek
Looks like the DcXMLParser spits out a metadata field called "title" and another title as part of the Tika XML stream. That metadata field is then added to the solr document by solrcell. If you add "title" to the captures the title from the XML stream gets added as well by solrcell. JSON suppor

Re: SolrCell help!

2013-07-22 Thread Wolfgang Hoschek
could add some more tests including readJson and the new > xquery and xslt in trunk? > > Best, > Flavio > On Mon, Jul 22, 2013 at 8:12 PM, Wolfgang Hoschek > wrote: > Looks like the DcXMLParser spits out a metadata field called "title" and > another title as pa

Re: SolrCell help!

2013-07-22 Thread Wolfgang Hoschek
ory) but for the new xslt and xquery > I'm not able to find the tests code..could you give me an hook? > > On Mon, Jul 22, 2013 at 9:21 PM, Wolfgang Hoschek > wrote: > There are many tests for this in the morphlines repo. > > Wolfgang. > > On Jul

Re: SolrCell help!

2013-07-23 Thread Wolfgang Hoschek
u couldn't be more precise ;) > > Thanks, > Flavio > > On Mon, Jul 22, 2013 at 11:02 PM, Wolfgang Hoschek > wrote: > Docs for the xquery and xslt morphline commands are here (look for xquery"): > https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/co

Re: SolrCell help!

2013-07-23 Thread Wolfgang Hoschek
commons-daemon/1.0.3/commons-daemon-1.0.3.pom. > Return code is: 409 -> [Help 1] > > > On Tue, Jul 23, 2013 at 10:22 AM, Wolfgang Hoschek > wrote: > Tests pass on java 6 but fail on java 7. Correspondingly, I have filed > https://issues.cloudera.org/browse/CDK-80. We'

Re: Flow management

2013-07-23 Thread Wolfgang Hoschek
Perhaps you could implement a custom command based on something like the Guava RateLimiter class. Wolfgang. On Jul 23, 2013, at 4:00 PM, Flavio Pompermaier wrote: > Hi to all, > > I need help in understanding how to manage the flow in Flume. More precisely, > I need to call a command that req

Re: Automatic log analysis and alert generation

2013-08-26 Thread Wolfgang Hoschek
Take a look at the Apache Flume Morphline Solr Sink, for example for heavy duty ETL processing and ingestion into Solr: http://flume.apache.org/FlumeUserGuide.html#morphlinesolrsink It provides a scripting engine that enables CEP on the flow of log events. Wolfgang. On Aug 26, 2013, at

Re: Delete first line of a file

2013-09-12 Thread Wolfgang Hoschek
There is no out of the box command to remove the first line from an event body but you could write one yourself and plug it in. If you just want to read CSV records from an event that contains a file, and do so while ignoring the first line, you can use ignoreFirstLine : true on the readCSV or

Re: [ANNOUNCE] New Flume Committer - Wolfgang Hoschek

2013-09-25 Thread Wolfgang Hoschek
Thanks everybody! Looking forward to a good ride. Wolfgang. On Sep 24, 2013, at 3:39 PM, Hari Shreedharan wrote: > On behalf of the Apache Flume PMC, I am excited to welcome Wolfgang Hoschek > as a committer on the Apache Flume project. Wolfgang contributed a new sink > with the abil

Re: Flume 1.4 and MorphlineInterceptor

2013-10-02 Thread Wolfgang Hoschek
You can use module cdk-morphlines-all for that. Wolfgang. On Oct 2, 2013, at 2:22 PM, bitsof info wrote: > Hi, > New to flume and I am trying to use the MorphlineInterceptor per the > documentation here: > > http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor > > When I run flu

Re: Unable to deliver event. Exception follows. java.lang.NullPointerException

2013-10-30 Thread Wolfgang Hoschek
Here is some material to get started with morphlines: http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor http://cloudera.github.io/cdk/docs/current/cdk-morphlines/index.html http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html http://cloudera.gi

Re: Using all SolrCloud servers in round-robin setup

2013-11-05 Thread Wolfgang Hoschek
Consider using the solrj client class CloudSolrServer, which queries zookeeper as necessary. This discussion isn't flume specific, so in the future please post to solr-u...@lucene.apache.org instead. Thanks, Wolfgang. On Nov 5, 2013, at 12:30 AM, Eric Bus wrote: > Hi, > > I'm currently using

Re: Dynamic Key=Value Parsing with an Interceptor?

2013-11-12 Thread Wolfgang Hoschek
Consider if the splitKeyValue command is applicable here, perhaps in combination with readLine, split and grok. Example is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/splitKeyValue Wolfgang. On Nov 12, 2013, at 3:18 PM, Matt Wise wrote: > Pa

Re: Handling malformed data when using custom AvroEventSerializer and HDFS Sink

2014-01-02 Thread Wolfgang Hoschek
FWIW, here is an example for how this could be handled in a MorphlineInterceptor: morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**"] commands : [ { tryRules { catchExceptions: true rules : [ # first rule

Re: Java heap space error while starting flume agent

2014-01-10 Thread Wolfgang Hoschek
Looks like you are running with a guava version that's different than the one that was used to compile. Flume uses guava 11.0.2 per flume/pom.xml. Wolfgang. On Jan 10, 2014, at 7:49 AM, Chhaya Vishwakarma wrote: > Hi > Thank you so much that error is gone now I am getting some different error >

Re: Java heap space error while starting flume agent

2014-01-10 Thread Wolfgang Hoschek
Flume requires guava. Wolfgang. On Jan 10, 2014, at 12:40 PM, Chhaya Vishwakarma wrote: > Hi, > My flume version is 1.4.0 and I have not put guava jar in classpath > > -Original Message----- > From: Wolfgang Hoschek [mailto:whosc...@cloudera.com] > Sent: Friday, Januar

Re: flume agent not starting

2014-01-21 Thread Wolfgang Hoschek
'tail' with exec source spits out one flume event per line into the interceptor. But readMultiLine expects multiple lines per event, not one line per event. In other words, by the time the data arrives in the interceptor it's already too late for readMultiLine to make sense. Wolfgang. On Jan

Re: flume agent not starting

2014-01-22 Thread Wolfgang Hoschek
t; Subject: RE: flume agent not starting > > Hi , > > What i should use then for interceptor to work shall I use CAT ?? > > -Original Message- > From: Wolfgang Hoschek [mailto:whosc...@cloudera.com] > Sent: Tuesday, January 21, 2014 6:30 PM > To: user@flume.apac

Re: Flume MorphlineInterceptor

2014-02-26 Thread Wolfgang Hoschek
Firstly, to print diagnostic information such as the content of records as they pass through the morphline commands, consider enabling TRACE log level, for example by adding the following line to your log4j.properties file: log4j.logger.org.kitesdk.morphline=TRACE Secondly, is it expected that

Re: morphlines + syslog rfc5424 record including json content

2014-03-26 Thread Wolfgang Hoschek
To fix up invalid JSON you can try readClob (or maybe readLine) followed by findReplace or grok, followed by toByteArray, followed by setValues { _attachment_body : "@{message}" }, followed by readJson. Wolfgang. On Mar 26, 2014, at 8:59 PM, Andrew Sammut wrote: > > Hi all > > I'm a relativ

Re: morphline if conditions contains

2014-03-27 Thread Wolfgang Hoschek
The “contains” command tests whether X is one of the elements in list Y, not a substring of some other string. You can use a mini script with the “java" command for that. Wolfgang. On Mar 27, 2014, at 10:55 PM, Andrew Sammut wrote: > > Hi all, > > I'm attempting to place a conditional state

Re: morphline - detecting mime type

2014-03-30 Thread Wolfgang Hoschek
detectMimetype can’t detect whether it’s valid JSON, it can at most see whether it looks like JSON in the first few bytes. Consider wrapping the readJson in a tryRules command to handle it. On Mar 30, 2014, at 10:11 PM, Andrew Sammut wrote: > > Hi all, > > Has anyone used detectMimetype to v

Re: Flume Jambalaya - A Flume Plugin with Multiple Components

2014-05-02 Thread Wolfgang Hoschek
My sense is that a) is interesting if it evolves into a capable true native tailer, whereas b) is already available in flume and c) and d) are already available in flume via the MorphlineInterceptor Wolfgang. On May 3, 2014, at 12:18 AM, Israel Ekpo wrote: > Flume Community, > > I created a

Re: Some dependencies can not be found in 1.5.0 release

2014-05-28 Thread Wolfgang Hoschek
There is no backwards incompatible change in the code regardless of whether it's kite 0.10 or 0.11 or 0.12 or 0.13 or 0.14. The dependencies have been made “optional” in flume-ng-sinks/flume-ng-morphline-solr-sink/pom.xml via true, thus the dependencies don’t ship automatically with the build.

Re: Query regarding readMultiLine in Morphlines config

2014-07-16 Thread Wolfgang Hoschek
A morphline receives a flume event at a time. What and how much is contained in the flume event is up to you, but flume isn’t really designed to send large events such as whole files or parts of files, it’s designed to send small discrete events, like a log line per event, or similar. There is

Re: Morphlines: getting the error "unclosed quotation" although handled

2014-07-17 Thread Wolfgang Hoschek
This means that your TSV data file contains invalid data. Every opening quote character needs to eventually be followed by a closing quote character in the data file. Such a closing quote is apparently missing. Consider fixing your input data, or perhaps try to handle it with readLine + split r

Re: Interceptor Morphlines or MorphlineSolrSink

2014-07-24 Thread Wolfgang Hoschek
A Sink allows to emit zero or multiple records per input event whereas an interceptor only allow to emit zero or one records per input event. Also, an interceptor can be used to route events to channels and hence sinks. Wolfgang. On Jul 24, 2014, at 10:23 AM, Guillermo Ortiz wrote: > I want t

Re: [ANNOUNCE] New Flume PMC Member - Roshan Naik

2014-11-05 Thread Wolfgang Hoschek
Congrats Roshan! On Nov 5, 2014, at 11:54 AM, Saravanan Nagarajan wrote: > Congratulations Roshan! > > On Wed, Nov 5, 2014 at 7:32 AM, Ahmed Radwan wrote: > >> Congrats Roshan! >> >> On Tue, Nov 4, 2014 at 2:12 PM, Arvind Prabhakar >> wrote: >> >>> On behalf of Apache Flume PMC, it is my