We run flume on the EC2 instances and sink the aggregates to S3, but we have to that (S3) due to cost constraints.
Mate On Tue, Jul 1, 2014 at 10:44 AM, Asim Zafir <asim.za...@gmail.com> wrote: > you will have to see how much of a performance comprise going to be flume > sink to s3. i would highly recommend using flume 1.5.0+ due to whole lot of > bug fixes and optimization it comes with. Moreover, if you can afford, > instead of going s3 route, fire up some EBS volume on EC2 and setup a HDFS > cluster and sink the files there. that would be much better then going to s3 > route > > > On Tue, Jul 1, 2014 at 1:42 AM, Máté Gulyás <guly...@dmlab.hu> wrote: >> >> Our current stack has Flume wwith Socket source and HDFS sink. We move >> to AWS and keeping flume would be a great time saver. Kinesis looks >> good, but If I can use flume I would stick with it. Due to S3 PUT >> price, we have to aggregate and flume does that with the filechannel. >> >> Mate Gulyas >> >> On Tue, Jul 1, 2014 at 10:05 AM, Nitin Pawar <nitinpawar...@gmail.com> >> wrote: >> > If you are heavily dependent on AWS stack then instead of kafka you can >> > look >> > at AWS Kinesis and then from their on there is good integration >> > available to >> > AWS s3 or any other service you want to dump data. >> > >> > >> > >> > >> > On Tue, Jul 1, 2014 at 1:33 PM, Asim Zafir <asim.za...@gmail.com> wrote: >> >> >> >> Kafka's framework is designed for scalable read i/o's then for a >> >> massive >> >> write event push coming to a centralize storage such as that of hdfs. >> >> >> >> not sure, how flume's avro sink to s3 would turn out for entire flume >> >> pipeline. i suspect it will be fatal to carry on a memory channel and >> >> even >> >> if you have a file chnanel on the flume agent/collectors, it is very >> >> likely >> >> it will cause buffering on the channel. >> >> >> >> >> >> >> >> >> >> On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <guly...@dmlab.hu> wrote: >> >>> >> >>> Please see my comments inline. >> >>> >> >>> YIMEN YIMGA Gael wrote: >> >>> > Could you please communicate the link of the article you read please >> >>> > ? >> >>> https://gist.github.com/crowdmatt/5256881 and the last comment. >> >>> >> >>> Sharninder wrote >> >>> > No reason to not use flume except for the fact that S3, since its >> >>> > over >> >>> > the wire, will be a lot slower than a local hdfs cluster in which >> >>> > case you >> >>> > need a big enough channel to hold events not yet processed out of >> >>> > the sink. >> >>> > If you have a fast enough pipe, you can very well use flume for this >> >>> > sort of >> >>> > use-case. >> >>> I plan to aggregate 5-15GB data with Filechannel, as I want to flush >> >>> to S3 every hour on every node. As far as I know Flume can gzip it, so >> >>> the size would be about 500MB-1,5GB. >> >>> >> >>> Thanks for the feedback, I will write If I have any results. >> >>> >> >>> Mate Gulyas >> >>> >> >>> On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharnin...@gmail.com> >> >>> wrote: >> >>> > No reason to not use flume except for the fact that S3, since its >> >>> > over >> >>> > the >> >>> > wire, will be a lot slower than a local hdfs cluster in which case >> >>> > you >> >>> > need >> >>> > a big enough channel to hold events not yet processed out of the >> >>> > sink. >> >>> > If >> >>> > you have a fast enough pipe, you can very well use flume for this >> >>> > sort >> >>> > of >> >>> > use-case. >> >>> > >> >>> > The reason the author might have moved to kafka, and I'm just >> >>> > speculating >> >>> > here, is that kafka provides him better buffering support for >> >>> > exactly >> >>> > the >> >>> > case I've written above. >> >>> > >> >>> > HTH >> >>> > Sharninder >> >>> > >> >>> > >> >>> > >> >>> > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <guly...@dmlab.hu> >> >>> > wrote: >> >>> >> >> >>> >> Hi! >> >>> >> >> >>> >> I would like to use flume to aggregate and send logs to an S3 >> >>> >> bucket. >> >>> >> I did some research, but the last article I found on the topic was >> >>> >> more then a year old and the author abandoned Flume for Kafka. My >> >>> >> other concern is that most of the articles were written for Flume >> >>> >> OG, >> >>> >> not NG. >> >>> >> Is there any reason why I should not use flume to sink messages to >> >>> >> S3? >> >>> >> >> >>> >> >> >>> >> Thanks in advance. >> >>> >> >> >>> >> Mate Gulyas >> >>> >> Lead Developer at Dmlab >> >>> > >> >>> > >> >> >> >> >> > >> > >> > >> > -- >> > Nitin Pawar > >