Re: Collect TCP data over TCP stream

2014-08-01 Thread Blade Liu
Hi Sharninder and Ashnish, Thanks for your nice suggestions. I agree one good solution would be writing some tools to glue libpcap, Avro and Flume. 2014-08-01 14:27 GMT+08:00 Sharninder : > Liu, you first need to figure out what TCP data you want to collect. Is > there a possibility that this d

multi-tier avro agents configuration problem

2014-08-28 Thread Blade Liu
Hi folks, I ran into a configuration problem of setting up multi-tier avro agents. The flow is as follows, and data is generated on tier1(slave3) by using "flume-ng avro-client --conf conf -H localhost -p 41000 -F /etc/hosts" tier1: slave3, avro source->avro sink tier2: slave2, avro source->logge

Config file synchronization

2014-08-31 Thread Blade Liu
Hi, I have a simple question about config file. In a distributed log collection environment, is it required for all hosts to use same config file? If yes, it indicates if one config file is changed and then all config files in other hosts should be updated. Or, config files are independent, and

Re: Config file synchronization

2014-09-01 Thread Blade Liu
Hi Anandkumar, Thanks a lot for the clarification! 2014-09-01 13:53 GMT+08:00 Anandkumar Lakshmanan : > Hi, > > Config files are independent. > Only the agent name in the file matters. > > Anand. > > > On 09/01/2014 08:06 AM, Blade Liu wrote: > >> Hi,

Re: Configuring flume agent dynamically from java code

2014-09-16 Thread Blade Liu
I guess you can assign customized parameters(provided by UI) using Java code and directly write file to flume.conf. It would be nice if someone develops a general framework... 2014-09-16 20:57 GMT+08:00 Ahmed Vila : > Hi Manohar, > > I must have miss read your question. You actually want to make

Serialization with Avro without schema

2014-09-18 Thread Blade Liu
Hi, The scenario is a machine dynamically generates data, which consists sections of binary data. We use Flume SDK to collect data and the sink is HDFS(SequenceFile). I'm curious what is in the sequence file, since Flume is unaware of schema. i.e., How does Flume and Avro do serialization without

Performance of Flume in production systems

2014-09-24 Thread Blade Liu
Hi, I'm going to deploy Flume in production systems, but a little worried about its performance in real-world environment. Could anyone tell me about Flume's actual performance in production environment? say, if Flume can deal with 20,000 events per second from a single source(and what about 100-2

Re: Performance of Flume in production systems

2014-09-25 Thread Blade Liu
said I have seen a single flume agent handle well over 20k eps > using the multiport syslog source. > > Here is a link to a presentation given by Arvind Prabhakar on planning a > flume deployment. > > http://goo.gl/FsfmmC > > -Jeff > > On Wed, Sep 24, 2014 at 10:53 P

Re: Programmatically configuring new source/sink into Flume agent

2014-09-28 Thread Blade Liu
I guess you want to reconfig and run flume agents on-the-fly. Using RPC is intended to restart agents or handle new config? Thanks, Blade 2014-09-27 7:58 GMT+08:00 terreyshih : > HI, Manohar: > > I don’t understand your examples. copying/moving file from one directory > to another is not what