Please see my comments inline. YIMEN YIMGA Gael wrote: > Could you please communicate the link of the article you read please ? https://gist.github.com/crowdmatt/5256881 and the last comment.
Sharninder wrote > No reason to not use flume except for the fact that S3, since its over the > wire, will be a lot slower than a local hdfs cluster in which case you need a > big enough channel to hold events not yet processed out of the sink. If you > have a fast enough pipe, you can very well use flume for this sort of > use-case. I plan to aggregate 5-15GB data with Filechannel, as I want to flush to S3 every hour on every node. As far as I know Flume can gzip it, so the size would be about 500MB-1,5GB. Thanks for the feedback, I will write If I have any results. Mate Gulyas On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharnin...@gmail.com> wrote: > No reason to not use flume except for the fact that S3, since its over the > wire, will be a lot slower than a local hdfs cluster in which case you need > a big enough channel to hold events not yet processed out of the sink. If > you have a fast enough pipe, you can very well use flume for this sort of > use-case. > > The reason the author might have moved to kafka, and I'm just speculating > here, is that kafka provides him better buffering support for exactly the > case I've written above. > > HTH > Sharninder > > > > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <guly...@dmlab.hu> wrote: >> >> Hi! >> >> I would like to use flume to aggregate and send logs to an S3 bucket. >> I did some research, but the last article I found on the topic was >> more then a year old and the author abandoned Flume for Kafka. My >> other concern is that most of the articles were written for Flume OG, >> not NG. >> Is there any reason why I should not use flume to sink messages to S3? >> >> >> Thanks in advance. >> >> Mate Gulyas >> Lead Developer at Dmlab > >