Hi Asim and Jeff, Thanks for your nice suggestions. I found two excellent articles, one on performance test and the other on deployment design/optimization in production system.
Flume NG Performance Measurements https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements Log collection system architecture and design of Meituan.com(Chinese version, strongly recommended that you can use Chrome translator for reading) http://tech.meituan.com/mt-log-system-arch.html http://tech.meituan.com/mt-log-system-optimization.html I guess building a stable and efficient collection system is challenging and also fun. Cheers, Blade 2014-09-26 3:15 GMT+08:00 Jeff Lord <jl...@cloudera.com>: > Whether or not flume can handle 20k eps will depend on several factors. > The main ones being: > 1. What is the avg size of event > 2. What source will you be using > > With that said I have seen a single flume agent handle well over 20k eps > using the multiport syslog source. > > Here is a link to a presentation given by Arvind Prabhakar on planning a > flume deployment. > > http://goo.gl/FsfmmC > > -Jeff > > On Wed, Sep 24, 2014 at 10:53 PM, Blade Liu <hafzc...@gmail.com> wrote: > >> Hi, >> >> I'm going to deploy Flume in production systems, but a little worried >> about its performance in real-world environment. Could anyone tell me about >> Flume's actual performance in production environment? say, if Flume can >> deal with 20,000 events per second from a single source(and what about >> 100-200 sources with one final HDFS sink). >> >> In addition, to reach good performance of tens of thousands of events per >> second, how many servers(agents) should be used? More agents(and more >> tiers), better performance? >> >> Thanks very much for your suggestions. >> >> >> Cheers, >> Blade >> > >