Kafka's framework is designed for scalable read i/o's then for a massive write event push coming to a centralize storage such as that of hdfs.
not sure, how flume's avro sink to s3 would turn out for entire flume pipeline. i suspect it will be fatal to carry on a memory channel and even if you have a file chnanel on the flume agent/collectors, it is very likely it will cause buffering on the channel. On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <guly...@dmlab.hu> wrote: > Please see my comments inline. > > YIMEN YIMGA Gael wrote: > > Could you please communicate the link of the article you read please ? > https://gist.github.com/crowdmatt/5256881 and the last comment. > > Sharninder wrote > > No reason to not use flume except for the fact that S3, since its over > the wire, will be a lot slower than a local hdfs cluster in which case you > need a big enough channel to hold events not yet processed out of the sink. > If you have a fast enough pipe, you can very well use flume for this sort > of use-case. > I plan to aggregate 5-15GB data with Filechannel, as I want to flush > to S3 every hour on every node. As far as I know Flume can gzip it, so > the size would be about 500MB-1,5GB. > > Thanks for the feedback, I will write If I have any results. > > Mate Gulyas > > On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharnin...@gmail.com> wrote: > > No reason to not use flume except for the fact that S3, since its over > the > > wire, will be a lot slower than a local hdfs cluster in which case you > need > > a big enough channel to hold events not yet processed out of the sink. If > > you have a fast enough pipe, you can very well use flume for this sort of > > use-case. > > > > The reason the author might have moved to kafka, and I'm just speculating > > here, is that kafka provides him better buffering support for exactly the > > case I've written above. > > > > HTH > > Sharninder > > > > > > > > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <guly...@dmlab.hu> wrote: > >> > >> Hi! > >> > >> I would like to use flume to aggregate and send logs to an S3 bucket. > >> I did some research, but the last article I found on the topic was > >> more then a year old and the author abandoned Flume for Kafka. My > >> other concern is that most of the articles were written for Flume OG, > >> not NG. > >> Is there any reason why I should not use flume to sink messages to S3? > >> > >> > >> Thanks in advance. > >> > >> Mate Gulyas > >> Lead Developer at Dmlab > > > > >