Our current stack has Flume wwith Socket source and HDFS sink. We move to AWS and keeping flume would be a great time saver. Kinesis looks good, but If I can use flume I would stick with it. Due to S3 PUT price, we have to aggregate and flume does that with the filechannel.
Mate Gulyas On Tue, Jul 1, 2014 at 10:05 AM, Nitin Pawar <nitinpawar...@gmail.com> wrote: > If you are heavily dependent on AWS stack then instead of kafka you can look > at AWS Kinesis and then from their on there is good integration available to > AWS s3 or any other service you want to dump data. > > > > > On Tue, Jul 1, 2014 at 1:33 PM, Asim Zafir <asim.za...@gmail.com> wrote: >> >> Kafka's framework is designed for scalable read i/o's then for a massive >> write event push coming to a centralize storage such as that of hdfs. >> >> not sure, how flume's avro sink to s3 would turn out for entire flume >> pipeline. i suspect it will be fatal to carry on a memory channel and even >> if you have a file chnanel on the flume agent/collectors, it is very likely >> it will cause buffering on the channel. >> >> >> >> >> On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <guly...@dmlab.hu> wrote: >>> >>> Please see my comments inline. >>> >>> YIMEN YIMGA Gael wrote: >>> > Could you please communicate the link of the article you read please ? >>> https://gist.github.com/crowdmatt/5256881 and the last comment. >>> >>> Sharninder wrote >>> > No reason to not use flume except for the fact that S3, since its over >>> > the wire, will be a lot slower than a local hdfs cluster in which case you >>> > need a big enough channel to hold events not yet processed out of the >>> > sink. >>> > If you have a fast enough pipe, you can very well use flume for this sort >>> > of >>> > use-case. >>> I plan to aggregate 5-15GB data with Filechannel, as I want to flush >>> to S3 every hour on every node. As far as I know Flume can gzip it, so >>> the size would be about 500MB-1,5GB. >>> >>> Thanks for the feedback, I will write If I have any results. >>> >>> Mate Gulyas >>> >>> On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharnin...@gmail.com> wrote: >>> > No reason to not use flume except for the fact that S3, since its over >>> > the >>> > wire, will be a lot slower than a local hdfs cluster in which case you >>> > need >>> > a big enough channel to hold events not yet processed out of the sink. >>> > If >>> > you have a fast enough pipe, you can very well use flume for this sort >>> > of >>> > use-case. >>> > >>> > The reason the author might have moved to kafka, and I'm just >>> > speculating >>> > here, is that kafka provides him better buffering support for exactly >>> > the >>> > case I've written above. >>> > >>> > HTH >>> > Sharninder >>> > >>> > >>> > >>> > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <guly...@dmlab.hu> wrote: >>> >> >>> >> Hi! >>> >> >>> >> I would like to use flume to aggregate and send logs to an S3 bucket. >>> >> I did some research, but the last article I found on the topic was >>> >> more then a year old and the author abandoned Flume for Kafka. My >>> >> other concern is that most of the articles were written for Flume OG, >>> >> not NG. >>> >> Is there any reason why I should not use flume to sink messages to S3? >>> >> >>> >> >>> >> Thanks in advance. >>> >> >>> >> Mate Gulyas >>> >> Lead Developer at Dmlab >>> > >>> > >> >> > > > > -- > Nitin Pawar