Re: Runtime reconfiguration

2012-12-03 Thread Jeff Lord
Hi Simon, Assuming you are using flume ng you can just modify the config add another collector and save the file. No need for a restart. The agent will check in periodically for changes. AFAIK every 30 seconds. -Jeff On Thu, Nov 29, 2012 at 12:43 AM, Simon Monecke wrote: > Hi, > > i want to us

Re: greetings - Flume on Windows

2012-12-13 Thread Jeff Lord
Andy, The current stable release of flume is 1.3.0 and you can always check which release is current on this page: http://flume.apache.org/releases/index.html In order to checkout this release from git you can issue the following command: git clone https://git-wip-us.apache.org/repos/asf/flume.g

Re: Cloudera Manager usage for Flume

2013-01-04 Thread Jeff Lord
Rahul, As of Cloudera Manager 4.1.0 you have the ability to manage a flume service. As well as reporting on various component metrics. https://ccp.cloudera.com/display/ENT41DOC/Adding+Services#AddingServices-AddingFlume https://ccp.cloudera.com/display/ENT41DOC/Flume+Metric+Details -Jeff On T

Re: How to start an agent programmatically?

2013-01-04 Thread Jeff Lord
Felix, In Flume 1.4 there is an embedded agent. You can download and build trunk and would be able to have this functionality. https://issues.apache.org/jira/browse/FLUME-1502 https://issues.apache.org/jira/secure/attachment/12560587/embedded-agent-3.pdf -Jeff On Thu, Jan 3, 2013 at 9:32 PM, F

Re: Error while building trunk

2013-01-08 Thread Jeff Lord
Felix, Try adding some heap using MAVEN_OPTS. e.g. export MAVEN_OPTS="-Xms512m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=512m" Than try to build. -Jeff On Sat, Jan 5, 2013 at 2:55 AM, Felix.徐 wrote: > Hi, > > I encountered with a problem while executing "mvn clean install > -DskipTests": >

Re: Of BatchSize / Channel Capacity / Transaction Capacity

2013-01-08 Thread Jeff Lord
Hi Bashkar, 1) Batch Size 1.a) When configured by client code using the flume-core-sdk , to send events to flume avro source. The flume client sdk has an appendBatch method. This will take a list of events and send them to the source as a batch. This is the size of the number of events to be pas

Re: Of BatchSize / Channel Capacity / Transaction Capacity

2013-01-11 Thread Jeff Lord
ve direct implications on the performance of flume nodes. > > thanks > Bhaskar > > > On Tue, Jan 8, 2013 at 9:40 PM, Jeff Lord wrote: > >> Hi Bashkar, >> >> 1) Batch Size >> 1.a) When configured by client code using the flume-core-sdk , to send >>

Re: Need for UDP / Multicast Source

2013-01-17 Thread Jeff Lord
Hi Andrew, You may try lowering transactionCapacity here. The transactionCapacity should be set to the value of the largest batch size that will be used to store or remove events from that channel. You currently have it equal to the capacity of the channel. So essentially the channel *could be* fi

Re: Multiplexing to multiple JdbcChannel (Derby) + event header ?

2013-01-22 Thread Jeff Lord
On Tue, Jan 22, 2013 at 2:51 AM, Alain B. wrote: > My question is: will these 2 channels store their events in separate derby > DB by default or do I need to configure my 2 jdbc-channels with specific > properties in order to get 2 embedded derby DB started ? > By default they will use the same

Re: Does Flume NG requires to be installed on all the sources?

2013-02-06 Thread Jeff Lord
Seshu, It really is going to depend on your use case. Though it sounds that you may need to run an agent on each of the source machines. Which source do you plan to use? It may also be the case that you can use the flume rpc client to write data directly from your application to the flume collecto

Re: Does Flume NG requires to be installed on all the sources?

2013-02-08 Thread Jeff Lord
into HDFS. > I can have a channel/collector machine where I install flume. I guess, > my question is, do I need to install flume on the servers where the log > messages lie and do I need to install flume in HDFS namenode too? > > Thanks, > - Seshu > > > On Wed, Feb 6, 2

Re: Flume-NG : Spooling dir source : java.io.IOException: Stream closed

2013-02-08 Thread Jeff Lord
The spooling directory source assumes that the files in the directory your are spooling are immutable. java.lang.**IllegalStateException: File name has been re-used with different files. Spooling assumption violated for /var/log/testhbase/hbase_1.**log.COMPLETED This message is indicative that a

Re: source status in FlumeNG

2013-02-12 Thread Jeff Lord
Madhu, When the channel is full the source will no longer be able to accept transactions and place them on the channel. It will not hang and will begin accepting transactions again once the channel has availability. This means the upstream sink|application will start to back up and is by design.

Re: Architecting Flume for failover

2013-02-19 Thread Jeff Lord
Noel, What test did you perform? Did you stop sink-2? Currently you have set a higher priority for sink-2 so it will be the default sink so long as it is up and running. -Jeff http://flume.apache.org/FlumeUserGuide.html#failover-sink-processor On Tue, Feb 19, 2013 at 5:03 PM, Noel Duffy wrote:

Re: Architecting Flume for failover

2013-02-19 Thread Jeff Lord
r of assertions online that this can be done, but so far, I've not > seen any examples of how to actually configure it. > > From: Jeff Lord [mailto:jl...@cloudera.com] > Sent: Wednesday, 20 February 2013 2:17 p.m. > To: user@flume.apache.org > Subject: Re: Architecting Flume f

Re: Log processing

2013-02-25 Thread Jeff Lord
Daniel, Flume was designed as a configurable pipeline for discrete events in order to get them reliably from a source (e.g. web server application) -> to a destination (e.g. into hdfs). Flume provides the facility to write the same event to multiple destinations (e.g. HDFS and Hbase or HDFS and Ca

Re: windows spooldir source problem

2013-02-27 Thread Jeff Lord
Have you considered using the move command instead of copy? On Tue, Feb 26, 2013 at 10:49 PM, 周梦想 wrote: > Hello, > I have a question using spooldir source. > > If I have a large file such as more than 100MB, when I copy this file to > spooldir, the flume agent will find it immediately and begi

Re: LoadBalancing Sink Processor question

2013-04-01 Thread Jeff Lord
Hi Paul, Would you kindly attach the logs from both tier 2 collectors where you observe the sinks occasionally stepping on each other. Can you please attach your flume config and note the version of flume-ng? Best, Jeff On Sun, Mar 31, 2013 at 7:12 PM, JR wrote: > Hi Paul, > >I apologize

Re: "single source - multi channel" scenario and applying interceptor while writing to only one channel and not on others...possible approaches

2013-04-19 Thread Jeff Lord
Hi Jagadish, Have you considered using a custom event serializer to modify your event? Its possible to replicate your flow using two channels and then have one sink that implements a custom serializer to modify the event. -Jeff On Tue, Apr 16, 2013 at 11:12 PM, Jagadish Bihani < jagadish.bih...

Re: "single source - multi channel" scenario and applying interceptor while writing to only one channel and not on others...possible approaches

2013-04-19 Thread Jeff Lord
Jagadish, Here is an example of how to write a custom serializer. https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java -Jeff On Fri, Apr 19, 2013 at 9:34 AM, Jeff Lord wrote: > Hi Jagadish, > >

Re: HBase Sink Reliability

2013-04-22 Thread Jeff Lord
Hi Dave, You are on the right track with thoughts here. The best way to ensure all events are successfully delivered to Hbase as well would be to use a separate channel for the hbase sink. -Jeff On Mon, Apr 22, 2013 at 8:11 AM, David Quigley wrote: > Hi, > > I am using flume to write events f

Re: HBase Sink Reliability

2013-04-25 Thread Jeff Lord
Mike Percy contributed a most excellent blog post on this topic. Have you had a chance to read over this? https://blogs.apache.org/flume/entry/flume_performance_tuning_part_1 "* Tuning the batch size trades throughput vs. latency and duplication under failure. With a small batch size, throughput

Re: Getting "Checking file:conf/flume.conf for changes" message in loop

2013-04-30 Thread Jeff Lord
Vikas, This message is normal and harmless. 2013-04-29 08:26:11,868 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvi der$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:188)] Checking file:conf/flume.conf for changes If you change your log se

Unsubscribe

2013-05-05 Thread Jeff Lord

Re: Flume Process event error

2013-05-08 Thread Jeff Lord
What version of flume are you running? rpm -qa | grep flume flume-ng version Can you please post your full config? and the log file? On Wed, May 8, 2013 at 8:34 PM, GuoWei wrote: > Hi, > > Recently I met the following problem. When I Process event in my custom > source . > > Channel closed [ch

Re: HDFS Sink stops writing events because HDFSWriter failed to append and close a file

2013-05-28 Thread Jeff Lord
Hi Ashish, What version of flume are you running? flume-ng version -Jeff On Fri, May 24, 2013 at 3:38 AM, Ashish Tadose wrote: > Hi All, > > We are facing this issue in production flume setup. > > Issue initiates when HDFS sink BucketWriter fails to append a batch for a > file because of had

Re: HBaseSink is very slow

2013-07-29 Thread Jeff Lord
Hi Deepak, 1. When using the load balancing sink group the list of sinks will be processed serially as opposed to in parallel. 2. The batch size on your source is very small. agent.sources.1374869469492.batchSize = 1 You may try increasing that for better throughput. 3. The AsyncHbaseSink is goi

Re: java.io.IOException: Bad response ERROR for block ... from datanode

2013-08-07 Thread Jeff Lord
Miguel, These errors usually indicate that there is a problem on your HDFS cluster. You should probably investigate the health of the cluster first. -Jeff On Wed, Aug 7, 2013 at 7:21 AM, Miguel Coelho dos Santos wrote: > Hi, > > we are using flume to write data to hdfs. > Our hdfs sinks recentl

Re: java.io.IOException: Bad response ERROR for block ... from datanode

2013-08-07 Thread Jeff Lord
n what typically in unhealthy in the HDFS cluster > when this error occurs? > > Miguel > ____ > From: Jeff Lord [jl...@cloudera.com] > Sent: 07 August 2013 19:31 > To: user@flume.apache.org > Subject: Re: java.io.IOException: Bad res

Re: Changing capacity configuration of File channel throws IllegalStateException

2013-09-13 Thread Jeff Lord
Deepesh, The FileChannel uses a fixed size checkpoint file so it is not possible to set it to unlimited size (the checkpoint file is mmap-ed to a fixed size buffer). To change the capacity of the channel, use the following procedure: Shutdown the agent. Delete all files in the file channel's chec

Re: Changing capacity configuration of File channel throws IllegalStateException

2013-09-16 Thread Jeff Lord
ersion of Flume are you running. Looks like you are hitting >> https://issues.apache.org/jira/browse/FLUME-1918 as well due to an >> unsupported channel size in a previous version. This was fixed in Flume >> 1.4.0 >> >> >> Hari >> >> >> Thanks, &g

Re: File Sink/Source

2013-10-07 Thread Jeff Lord
Yes the file channel is designed to handle this and is what you should be using. You are also on the right track regarding sizing your file channel to account for the number of events that could accumulate in the event that your terminal sink is unable to complete transactions. With the amount of d

Re: send the whole logs

2013-10-11 Thread Jeff Lord
So if you use trunk and set the keepFields property to true than the Timestamp and Hostname will be preserved in the body of the event now. https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst#syslog-sources On Fri, Oct 11, 2013 at 7:29 AM, David Sinclair < dsincl...

Re: Use flume to copy data in local directory (hadoop server) into hdfs

2013-10-21 Thread Jeff Lord
Luu, Have you tried using the spooling directory source? -Jeff On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu wrote: > Hi all, > > I need to copy data in a local directory (hadoop server) into hdfs > regularly and automatically. This is my flume config: > > agent.sources = execSource > agent.chan

Re: HDFS Sink Config Help

2013-10-31 Thread Jeff Lord
Jeremy, Datastream fileType will let you write text files. CompressedStream will do just that. SequenceFile will create sequence files as you have guessed and you can use either Text or Writeable (bytes) for your data here. So flume is configureable out of the box with regards to the size of your

Re: Preserving origin syslog information

2013-10-31 Thread Jeff Lord
Devin, FLUME-1666 added a keepFields property that will allow you to preserve the timestamp and hostname in the body of the generated flume event. That patch was committed to trunk a couple of weeks ago so if you use trunk to build it should be available. https://issues.apache.org/jira/browse/FLUM

Re: HDFS Sink Config Help

2013-11-01 Thread Jeff Lord
; Thanks again! > > -- Jeremy > > > > On Thu, Oct 31, 2013 at 4:42 PM, Jeff Lord wrote: > >> Jeremy, >> >> Datastream fileType will let you write text files. >> CompressedStream will do just that. >> SequenceFile will create sequence files as you ha

Re: Send data to a remote Hbase server

2013-11-04 Thread Jeff Lord
Zookeeper should already be running on the hbase server. If you are using standalone mode it is run within the same jvm as hbase. On Fri, Nov 1, 2013 at 2:14 PM, George Pang wrote: > Hi Ashish, > > Does it mean I have to install zookeeper too in the HBase box, in order to > talk to Hbase from

Re: Flume File Channel Filling Up The Disk With Transaction Log, Any Way To Prevent It

2013-11-25 Thread Jeff Lord
Its fine to run in a VM. Out of curiosity why are you running two agents on the machine though? On Mon, Nov 25, 2013 at 1:54 PM, Brock Noland wrote: > It the channel is full your clients will get a rejection notice. > > Capacity planning on the FC is a mix between event size, channel size, > a

Re: Flume File Channel Filling Up The Disk With Transaction Log, Any Way To Prevent It

2013-11-25 Thread Jeff Lord
2 in each VM) > > Is running single agent per VM recommend ? > > -Ritesh > > > > On Nov 25, 2013, at 3:23 PM, Jeff Lord wrote: > > Its fine to run in a VM. > Out of curiosity why are you running two agents on the machine though? > > > > On Mon, Nov 25, 201

Re: java.lang.OutOfMemoryError: unable to create new native thread

2013-11-26 Thread Jeff Lord
Can you provide the logfile and config? On Tue, Nov 26, 2013 at 12:20 PM, Cochran, David wrote: > I've got a pretty good sized box collecting logs for a number of sources > (about a dozen or so). > Actually two instances were running on this box (one production and the > other a testing environm

Re: Optional Channels

2013-12-03 Thread Jeff Lord
Sounds reasonable to allow this via a config property. Can you please submit the Jira? On Tue, Dec 3, 2013 at 7:24 AM, James Estes wrote: > We're on flume 1.4.0. Hm. So looking at the code you are right…I'd not > looked closely enough at the transaction behavior for the MemoryChannel. > When

Re: Flume Event Header- add timestamp

2013-12-05 Thread Jeff Lord
Can you post your entire config and log ? On Thu, Dec 5, 2013 at 1:10 AM, Salih Kardan wrote: > I have a problem with adding time-stamp to flume header. Here is a snipped > from my conf file. > > > agent.sources.avrosource.interceptors.addTimestamp.type = > org.apache.flume.interceptor.Timesta

Re: RELP support

2013-12-17 Thread Jeff Lord
Hi Otis, It makes sense for flume to support RELP protocol. Will need to do some digging to determine whether it makes sense to have its own unique source or we can bolt this onto the multiport tcp source as a config switch. Unless someone on the list has any ideas? Best, Jeff On Tue, Dec 17,

Re: Configuring flume agents remotely

2014-01-03 Thread Jeff Lord
Monitoring and configuration are two separate things here. Flume is typically monitored using either ganglia or http/json. Both methods are documented here: http://flume.apache.org/FlumeUserGuide.html#monitoring As for configuration management and changes a common way of handling this would be to

Re: seeking help on flume cluster deployment

2014-01-09 Thread Jeff Lord
Chen, Have you taken a look at this presentation on Planning and Deploying Flume from ApacheCon? http://archive.apachecon.com/na2013/presentations/27-Wednesday/Big_Data/11:45-Mastering_Sqoop_for_Data_Transfer_for_Big_Data-Arvind_Prabhakar/Arvind%20Prabhakar%20-%20Planning%20and%20Deploying%20Apac

Re: seeking help on flume cluster deployment

2014-01-09 Thread Jeff Lord
and i am looking for a fault tolerant deployment of flume, that > can read from this single data source and sink to hdfs in fault tolerant > mode: when one node dies, another flume node can pick up and continue; > Thanks, > Chen > > > On Thu, Jan 9, 2014 at 7:49 PM, Jeff Lord wr

Re: Flume RPC back off bug ?

2014-01-10 Thread Jeff Lord
Bean, Can you please open a jira? Thank You, Jeff On Fri, Jan 10, 2014 at 7:16 AM, Bean Edwards wrote: > If I change the condition (allowableDiff > delta) to > (allowableDiff < delta), it works fine. from line 103 of OrderSelector > > > On Fri, Jan 10, 2014 at 11:13 PM, Bean Edward

Re: issues with configuration updates and clean shutdowns

2014-01-15 Thread Jeff Lord
Josh, If you modify your config than the flume agent will see the config has changed and reload any components that have been modified. Are you able to provide the logs from the flume agent which occur following a modification of the config? What command are you using to signal for a shutdown? -J

Re: JMS Source

2014-01-16 Thread Jeff Lord
Your config and anymore logfile context you can provide will help get you an answer. On Thu, Jan 16, 2014 at 10:29 AM, P lva wrote: > Hello everyone, > > I'm trying to configure a jms source in flume agent, but i get this error > > Could not create initial context > com.tibco.tibjms.naming.Tibj

Re: JMS Source

2014-01-17 Thread Jeff Lord
Connection(TibjmsxCFImpl.java:253) > at > com.tibco.tibjms.TibjmsQueueConnectionFactory.createQueueConnection(TibjmsQueueConnectionFactory.java:87) > at > com.tibco.tibjms.naming.TibjmsContext$Messenger.request(TibjmsContext.java:325) > at > com.tibco.tibjms.nami

Re: best way to make all hdfs records in one file under a folder?

2014-01-20 Thread Jeff Lord
If you don't intend to roll based on # of events than you will want to set rollCount to 0. MyAgent.sinks.HDFS.hdfs.rollCount = 0 On Mon, Jan 20, 2014 at 12:35 PM, Jimmy wrote: > Seems like the only reason is "too many files" issue, correct? > > File Crusher executed regularly might be better op

Re: hdfs.fileType = CompressedStream

2014-01-30 Thread Jeff Lord
You are using gzip so the files won't splittable. You may be better off using snappy and sequence files. On Thu, Jan 30, 2014 at 10:51 AM, Jimmy wrote: > I am running few tests and would like to confirm whether this is true... > > hdfs.codeC = gzip > hdfs.fileType = CompressedStream > hdfs.writ

Re: Source Failover and Sink Failover

2014-02-07 Thread Jeff Lord
Mayur, The hdfs sink is going to keep trying to connect for maxRetries=10 Are you able to post the complete log? or at least another couple of minutes ? -Jeff On Fri, Feb 7, 2014 at 1:32 AM, Mayur Gupta wrote: > 1) The source is Avro client. The events are lost. The intent of the > question

Re: Embedded Agent vs Client SDK

2014-02-13 Thread Jeff Lord
Gary, I'm going to just quote the design doc here: https://issues.apache.org/jira/secure/attachment/12560587/embedded-agent-3.pdf 1. A Flume Embedded agent would be useful to applications which send data to a Flume agent acting as a "collector". Currently using the RPCClient or HTTPSource, if th

Re: Issue with HBase Sink in Flume ( 1.3.0)

2014-02-17 Thread Jeff Lord
Logs ? On Mon, Feb 17, 2014 at 5:51 AM, Kris Ogirri wrote: > Dear Mailing Group, > > I am currently having issues with the Hbase sink function. I have developed > an agent with a fanout channel setup ( single source, multiple channels, > multiple sinks) sinking to a HDFS cluster and Hbase deploym

Re: Flume with ActiveMQ

2014-02-24 Thread Jeff Lord
Have you tried using the fqcn of the connection factory? On Monday, February 24, 2014, P lva wrote: > If there is no connection factory called 'GenericConnectionFactory' the > lookup fails and you get this. > > > > On Mon, Feb 24, 2014 at 1:29 PM, richard ross > wrote: > > Thanks for the reply.

Re: Flume with ActiveMQ

2014-02-24 Thread Jeff Lord
, > Richard. > > On Feb 24, 2014, at 4:05 PMEST, Jeff Lord wrote: > > Have you tried using the fqcn of the connection factory? > > On Monday, February 24, 2014, P lva wrote: >> >> If there is no connection factory called 'GenericConnectionFactory' the >>

Re: Flume with ActiveMQ

2014-02-24 Thread Jeff Lord
; On Feb 24, 2014, at 6:09 PMEST, Jeff Lord wrote: > >> I think you can just drop the connectionFactory property from the >> config altogether with activemq and it will work. >> >> On Mon, Feb 24, 2014 at 2:17 PM, Richard Ross >> wrote: >>> Thanks for these

Re: Maintaining message/event order

2014-02-26 Thread Jeff Lord
Richard, Flume does not enforce any guarantees on ordering of events. -Jeff On Wed, Feb 26, 2014 at 5:41 AM, richard ross wrote: > Hello: > > I am using Flume 1.4 with a JMS --> File Channel --> HDFS data pipeline, and > was wondering if Flume can guarantee order of messages/events (i.e., the >

Re: Flume log event per file

2014-02-27 Thread Jeff Lord
It looks like you have not configured any properties for "rolling" files on hdfs. The default rollCount is 10 (events). http://flume.apache.org/FlumeUserGuide.html#hdfs-sink The flume hdfs sink can be configured to roll based on size, # of events, or time. hdfs.rollInterval30Number of seconds to

Re: move file as they are with flume

2014-03-03 Thread Jeff Lord
Are you using the spooling directory source? We added the ability to just set the basename of a file (without absolute path) in FLUME-2056. Allow SpoolDir to pass just the filename that is the source of an event. On Fri, Feb 28, 2014 at 7:01 AM, Iván Fernández Perea wrote: > Hi, > > I'm a newbie

Re: Multiple flume agent on single machine

2014-03-13 Thread Jeff Lord
You can setup flume to use hdfs.proxyUser https://cwiki.apache.org/confluence/display/FLUME/Flume+1.x+Secure+HDFS+Setup On Thu, Mar 13, 2014 at 2:26 PM, Christopher Shannon wrote: > What if your sinks have to write out to destinations that have different > users and different levels of authoriz

Re: some one explain how filechannel works

2014-03-20 Thread Jeff Lord
https://blogs.apache.org/flume/entry/apache_flume_filechannel On Thu, Mar 20, 2014 at 12:21 AM, Bean Edwards wrote: > i use filechannel,and monitor it from http response.i > found ChannelFillPercentage > will increase and never get back to 0% what happens? what's > more,filechannel dataDirs al

Re: Fastest way to get data into flume?

2014-03-27 Thread Jeff Lord
Increase your batch sizes On Thu, Mar 27, 2014 at 12:29 PM, Chris Schneider < ch...@christopher-schneider.com> wrote: > Thanks for all the great replies. > > My specific situation is a bit more complex than I let on initially. > > Flume running multiple agents will absolutely be able to scale to

Re: preserve syslog header in hdfs sink

2014-03-28 Thread Jeff Lord
Do you have the appropriate interceptors configured? On Fri, Mar 28, 2014 at 12:28 PM, Ryan Suarez < ryan.sua...@sheridancollege.ca> wrote: > RTFM indicates I need the following sink properties: > > --- > hadoop-t1.sinks.hdfs1.serializer = org.apache.flume.serialization. > HeaderAndBodyTextEvent

Re: preserve syslog header in hdfs sink

2014-04-01 Thread Jeff Lord
mem1.type = memory > hadoop-t1.channels.mem1.capacity = 1000 > hadoop-t1.channels.mem1.transactionCapacity = 100 > > # Bind the source and sink to the channel > hadoop-t1.sources.r1.channels = mem1 > hadoop-t1.sinks.s1.channel = mem1 > > > > On 14-03-28 3:37 PM, Je

Re: Flume Configuration & topology approach

2014-04-03 Thread Jeff Lord
Mohit, Are you using memory channel? You mention you are getting OOME but you don't even say what the heap you are setting on the flume jvm is? Don't run an agent on the namenode. Occasionally you will see folks installing an agent on one of the datanodes in the cluster but its not typically reco

Re: Flume Configuration & topology approach

2014-04-07 Thread Jeff Lord
ctor nodes and even change their configurations. > > Absolutely Cloudera Manager can be used to install, manage, and monitor your flume agents. > So we are very much beginners in this field, any suggestions or > recommendations are welcome. Thanks for your help :) > > > Mohit &

Re: Flume Configuration & topology approach

2014-04-07 Thread Jeff Lord
No. If you need to guarantee delivery of events please use a file channel. https://blogs.apache.org/flume/entry/apache_flume_filechannel On Mon, Apr 7, 2014 at 8:38 AM, Christopher Shannon wrote: > > On Apr 7, 2014 9:35 AM, "Jeff Lord" wrote: > > > > > > >

Re: Flume Configuration & topology approach

2014-04-07 Thread Jeff Lord
That would have to mean that the downstream agent is sending an > ack to the upstream agent before it actually persists the event. > > On Apr 7, 2014 10:48 AM, "Jeff Lord" wrote: > > > > No. If you need to guarantee delivery of events please use a file >

Re: Import files from a directory on remote machine

2014-04-16 Thread Jeff Lord
http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source On Wed, Apr 16, 2014 at 5:14 PM, Something Something < mailinglist...@gmail.com> wrote: > Hello, > > Needless to say I am newbie to Flume, but I've got a basic flow working in > which I am importing a log file from my linux bo

Re: Import files from a directory on remote machine

2014-04-16 Thread Jeff Lord
tions about this? > > > On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord wrote: > >> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source >> >> >> On Wed, Apr 16, 2014 at 5:14 PM, Something Something < >> mailinglist...@gmail.com> wrote:

Re: Import files from a directory on remote machine

2014-04-17 Thread Jeff Lord
r) is probably >>> your best bet to ingest files from a remote machine that you only have read >>> access to. But then again you're sorta stepping outside of the use case of >>> flume at some level here as rsync is now basically a part of your flume >>> topol

Re: Import files from a directory on remote machine

2014-04-23 Thread Jeff Lord
> Hi Jeff, > > On Thu, Apr 17, 2014 at 1:11 PM, Jeff Lord wrote: > >> Using the exec source with a tail -f is not considered a production >> solution. >> It mainly exists for testing purposes. >> > > This statement surprised me. Is that the gener

Re: FW: Memory Channel gets full.. Avro Sinks cannot drain the events at a fast rate

2014-05-02 Thread Jeff Lord
Kushal, Have you considered removing the sinks from the sinkGroup? This will increase your concurrency for processing channel events by allowing both sinks to read from the channel simultaneously. With a sink group in place only one sink will read at a time. Hope this helps. -Jeff On Fri, May

Re: Configure and start a Flume sink + agent from java.

2014-05-20 Thread Jeff Lord
Have you looked at some of the test classes? That may be a good way to see how you can accomplish this with straight java. https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java On Tue, May 20, 2014 at 6:46 AM, Ja

Re: HDFS not adding \n

2014-06-05 Thread Jeff Lord
Can you try adding this line to your config? tier1.sinks.sinkDHCP_Raw.serializer = text

Re: Flume Embedded Agent Interrupted in Handshake

2014-07-04 Thread Jeff Lord
Adam, You are mostly correct. The one thing I might add that may help is to know that the sink is consuming the events from the channel, writing them to the next hop source and then committing the transaction. As opposed to the channel pushing the events, as the channel is a passive component. You

Re: Custom sink/source

2014-07-21 Thread Jeff Lord
start() is called when the agent is started and the sink component is then started. calling process() will take a batch of events off the channel and send to the next hop or terminal location. stop() is called when the agent is shutdown and the sink component resources are unloaded. Have you seen t

Re: Custom sink/source

2014-07-21 Thread Jeff Lord
ined very well. I'm new to Java (and flume) so maybe > that's just me. > > Your explanation helps. > > -- > Sharninder > > On 21-Jul-2014, at 8:33 pm, Jeff Lord > wrote: > > start() is called when the agent is started and the sink component is then > star

Re: how spooling directory source identifies the complete file

2014-07-22 Thread Jeff Lord
I believe the way this works is that flume creates a meta directory to track which file is being read. In the event of a restart of the agent the entire file will be re-read which will create some duplicate events. https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apac

Re: flume failover only support two nodes?

2014-08-13 Thread Jeff Lord
Also all of your sinks are pointing to the same host for the next hop. So if the agent on that host is unavailable for some reason than failover is pointless. For testing this ok, for production there is a better way. On Wednesday, August 13, 2014, Hari Shreedharan wrote: > Each sink needs to ha

Re: multi-tier avro agents configuration problem

2014-08-28 Thread Jeff Lord
I think you want this to bind to slave2 or even better the appropriate ip tier2.sources.source2.bind= slave3 If that doesn't work please send the log snippet. On Thursday, August 28, 2014, Blade Liu wrote: > Hi folks, > > I ran into a configuration problem of setting up multi-tier avro age

Re: Avro source and sink

2014-09-02 Thread Jeff Lord
Ed, Did you take a look at the javadoc in the source? Basically the source uses netty as a server and the sink is just an rpc client. If you read over the doc which is in the two links below and take a look at the developer guide and still have questions just ask away and someone will help to answ

Re: Performance of Flume in production systems

2014-09-25 Thread Jeff Lord
Whether or not flume can handle 20k eps will depend on several factors. The main ones being: 1. What is the avg size of event 2. What source will you be using With that said I have seen a single flume agent handle well over 20k eps using the multiport syslog source. Here is a link to a presentati

Re: Flume Syslog source

2014-10-15 Thread Jeff Lord
You can also use a regex interceptor to extract hostname from the message (assuming it's there) and put that in an event header. From there you can route and create partitions with the header. On Wednesday, October 15, 2014, Hari Shreedharan wrote: > The Multiport syslog source can add the port

Re: Flume Syslog source

2014-10-16 Thread Jeff Lord
gt;> that there would be a some random device which will not send their logs in >> the proper format and my regex will break. This is the way I'll implement >> it if I can't find anything better. >> >> Thanks, >> Sharninder >> >> >> >>

Re: Slow write throughput to HDFS

2014-10-20 Thread Jeff Lord
Pal, You can add more sinks to your config. Don't put them in a sink group just have multiple sinks pulling from the same channel. This should increase your throughput. Best, Jeff On Mon, Oct 20, 2014 at 3:49 AM, Pal Konyves wrote: > Hi there, > > We would like to write lots of logs to HDFS v

Re: Slow write throughput to HDFS

2014-10-20 Thread Jeff Lord
y functional > benefits? > > Thanks, > Pal > > On Mon, Oct 20, 2014 at 3:22 PM, Jeff Lord wrote: > > Pal, > > > > You can add more sinks to your config. > > Don't put them in a sink group just have multiple sinks pulling from the > > same channel.

Re: flume avro event overflow ?

2014-10-20 Thread Jeff Lord
I know this is not exactly what you are asking for but have you had a look at the spillable memory channel. https://flume.apache.org/FlumeUserGuide.html#spillable-memory-channel On Sun, Oct 19, 2014 at 1:38 AM, terreyshih wrote: > In other words, I would like to explicitly drop the events if the

Re: flume syslog source max msg size

2014-10-27 Thread Jeff Lord
What about your flume config? Did you try increasing the eventSize? On Mon, Oct 27, 2014 at 11:30 AM, Mohit Durgapal wrote: > Hi, > > I am using rsyslog to send messages to flume nodes via AWS ELB. On flume > nodes I am using the source type *syslogtcp * where the ELB forwards the > messages.

Re: Flume HDFS Sink: Dynamic Path format for IP Address

2014-11-04 Thread Jeff Lord
Hi Traino, The syslog multiport source should automatically build the event using the hostname from the syslog message. From there you can just use the macro on your hdfs sink to use the value of the hostname event header. e.g. agent.sinks.sink-1.hdfs.path = /user/flume/Syslog/%{host}/ Hope thi

Re: [ANNOUNCE] New Flume PMC Member - Roshan Naik

2014-11-04 Thread Jeff Lord
Congrats Roshan On Tue, Nov 4, 2014 at 2:31 PM, Hari Shreedharan wrote: > Congrats Roshan! > > > Thanks, > Hari > > On Tue, Nov 4, 2014 at 2:12 PM, Arvind Prabhakar > wrote: > > > On behalf of Apache Flume PMC, it is my pleasure to announce that Roshan > > Naik has been elected to the Flume Pro

Re: Does HTTP Source suitable to get tweets from Gnip?

2014-11-06 Thread Jeff Lord
I am not familiar with gnip. Did you take a look at the twitter source? On Thu, Nov 6, 2014 at 4:09 AM, Rafeeq S wrote: > I am new to flume and I am trying to stream tweets which is from gnip > using Flume. > > Please suggest , which Flume source need to be used to stream tweets from > Gnip. > D

Re: File channels creating many large files

2014-11-07 Thread Jeff Lord
Guy, What version of flume is this? -Jeff On Fri, Nov 7, 2014 at 1:19 AM, Needham, Guy wrote: > Hi all, > > I have a configuration with a file channel configured such that: > > a1.channels.ch1.type = file > a1.channels.ch1.checkpointDir = /hadoop/user/flume/channels/checkpoint > a1.channels.c

Re: Flume IBM MQ - JMS Source

2014-12-19 Thread Jeff Lord
Do you have the jms class in your cp? java.lang.NoClassDefFoundError: javax/jms/JMSException On Fri, Dec 19, 2014 at 1:02 PM, Darshan Pandya wrote: > > Hi Folks, > I am new to flume. > I wanted to check if anyone has connected an IBM MQ to the JMS Source in > Flume. > I quickly configured flume w

Re: Monitoring the progress of events

2015-01-26 Thread Jeff Lord
You should be able to use the channelsize On Mon, Jan 26, 2015 at 2:29 PM, Carlotta Hicks wrote: > Are these the counters from MonitoredCounterGroup? What is the scope of > these counters? Can you reset these counters? > > -Original Message- > From: Joey Echeverria [mailto:j...@clouder

Re: How to handle ChannelFullException

2015-01-29 Thread Jeff Lord
Have you considered increasing the size of the memory channel? I haven't played with Kafka sink much but in regards to hdfs we often add sinks which can help to increase the flow of the channel. The multi port Syslog source is the way to go here as it will give better performance. We should probabl

Re: Simple- Just copying plain files into the cluster (hdfs) using flume - possible?

2015-02-02 Thread Jeff Lord
Bob, You may want to have a look at Apache Nifi. http://ingest.tips/2014/12/22/getting-started-with-apache-nifi/ Regards, Jeff On Mon, Feb 2, 2015 at 3:49 PM, Bob Metelsky wrote: > Steve - I appreciate you time on this... > > Yes, I want to use flume to copy .xml or .whatever files from a s

  1   2   >