We run flume on the EC2 instances and sink the aggregates to S3, but
we have to that (S3) due to cost constraints.

Mate

On Tue, Jul 1, 2014 at 10:44 AM, Asim Zafir <asim.za...@gmail.com> wrote:
> you will have to see how much of a performance comprise going to be flume
> sink to s3. i would highly recommend using flume 1.5.0+ due to whole lot of
> bug fixes and optimization it comes with. Moreover, if you can afford,
> instead of going s3 route, fire up some EBS volume on EC2 and setup a HDFS
> cluster and sink the files there. that would be much better then going to s3
> route
>
>
> On Tue, Jul 1, 2014 at 1:42 AM, Máté Gulyás <guly...@dmlab.hu> wrote:
>>
>> Our current stack has Flume wwith Socket source and HDFS sink. We move
>> to AWS and keeping flume would be a great time saver. Kinesis looks
>> good, but If I can use flume I would stick with it. Due to S3 PUT
>> price, we have to aggregate and flume does that with the filechannel.
>>
>> Mate Gulyas
>>
>> On Tue, Jul 1, 2014 at 10:05 AM, Nitin Pawar <nitinpawar...@gmail.com>
>> wrote:
>> > If you are heavily dependent on AWS stack then instead of kafka you can
>> > look
>> > at AWS Kinesis and then from their on there is good integration
>> > available to
>> > AWS s3 or any other service you want to dump data.
>> >
>> >
>> >
>> >
>> > On Tue, Jul 1, 2014 at 1:33 PM, Asim Zafir <asim.za...@gmail.com> wrote:
>> >>
>> >> Kafka's framework is designed for scalable read i/o's then for a
>> >> massive
>> >> write event push coming to a centralize storage such as that of hdfs.
>> >>
>> >> not sure, how flume's avro sink to s3 would turn out for entire flume
>> >> pipeline. i suspect it will be fatal to carry on a memory channel and
>> >> even
>> >> if you have a file chnanel on the flume agent/collectors, it is very
>> >> likely
>> >> it will cause buffering on the channel.
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <guly...@dmlab.hu> wrote:
>> >>>
>> >>> Please see my comments inline.
>> >>>
>> >>> YIMEN YIMGA Gael wrote:
>> >>> > Could you please communicate the link of the article you read please
>> >>> > ?
>> >>> https://gist.github.com/crowdmatt/5256881 and the last comment.
>> >>>
>> >>> Sharninder wrote
>> >>> > No reason to not use flume except for the fact that S3, since its
>> >>> > over
>> >>> > the wire, will be a lot slower than a local hdfs cluster in which
>> >>> > case you
>> >>> > need a big enough channel to hold events not yet processed out of
>> >>> > the sink.
>> >>> > If you have a fast enough pipe, you can very well use flume for this
>> >>> > sort of
>> >>> > use-case.
>> >>> I plan to aggregate 5-15GB data with Filechannel, as I want to flush
>> >>> to S3 every hour on every node. As far as I know Flume can gzip it, so
>> >>> the size would be about 500MB-1,5GB.
>> >>>
>> >>> Thanks for the feedback, I will write If I have any results.
>> >>>
>> >>> Mate Gulyas
>> >>>
>> >>> On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharnin...@gmail.com>
>> >>> wrote:
>> >>> > No reason to not use flume except for the fact that S3, since its
>> >>> > over
>> >>> > the
>> >>> > wire, will be a lot slower than a local hdfs cluster in which case
>> >>> > you
>> >>> > need
>> >>> > a big enough channel to hold events not yet processed out of the
>> >>> > sink.
>> >>> > If
>> >>> > you have a fast enough pipe, you can very well use flume for this
>> >>> > sort
>> >>> > of
>> >>> > use-case.
>> >>> >
>> >>> > The reason the author might have moved to kafka, and I'm just
>> >>> > speculating
>> >>> > here, is that kafka provides him better buffering support for
>> >>> > exactly
>> >>> > the
>> >>> > case I've written above.
>> >>> >
>> >>> > HTH
>> >>> > Sharninder
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <guly...@dmlab.hu>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi!
>> >>> >>
>> >>> >> I would like to use flume to aggregate and send logs to an S3
>> >>> >> bucket.
>> >>> >> I did some research, but the last article I found on the topic was
>> >>> >> more then a year old and the author abandoned Flume for Kafka. My
>> >>> >> other concern is that most of the articles were written for Flume
>> >>> >> OG,
>> >>> >> not NG.
>> >>> >> Is there any reason why I should not use flume to sink messages to
>> >>> >> S3?
>> >>> >>
>> >>> >>
>> >>> >> Thanks in advance.
>> >>> >>
>> >>> >> Mate Gulyas
>> >>> >> Lead Developer at Dmlab
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>
>

Reply via email to