Kafka's framework is designed for scalable read i/o's then for a massive
write event push coming to a centralize storage such as that of hdfs.

not sure, how flume's avro sink to s3 would turn out for entire flume
pipeline. i suspect it will be fatal to carry on a memory channel and even
if you have a file chnanel on the flume agent/collectors, it is very likely
it will cause buffering on the channel.




On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <guly...@dmlab.hu> wrote:

> Please see my comments inline.
>
> YIMEN YIMGA Gael wrote:
> > Could you please communicate the link of the article you read please ?
> https://gist.github.com/crowdmatt/5256881 and the last comment.
>
> Sharninder wrote
> > No reason to not use flume except for the fact that S3, since its over
> the wire, will be a lot slower than a local hdfs cluster in which case you
> need a big enough channel to hold events not yet processed out of the sink.
> If you have a fast enough pipe, you can very well use flume for this sort
> of use-case.
> I plan to aggregate 5-15GB data with Filechannel, as I want to flush
> to S3 every hour on every node. As far as I know Flume can gzip it, so
> the size would be about 500MB-1,5GB.
>
> Thanks for the feedback, I will write If I have any results.
>
> Mate Gulyas
>
> On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharnin...@gmail.com> wrote:
> > No reason to not use flume except for the fact that S3, since its over
> the
> > wire, will be a lot slower than a local hdfs cluster in which case you
> need
> > a big enough channel to hold events not yet processed out of the sink. If
> > you have a fast enough pipe, you can very well use flume for this sort of
> > use-case.
> >
> > The reason the author might have moved to kafka, and I'm just speculating
> > here, is that kafka provides him better buffering support for exactly the
> > case I've written above.
> >
> > HTH
> > Sharninder
> >
> >
> >
> > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <guly...@dmlab.hu> wrote:
> >>
> >> Hi!
> >>
> >> I would like to use flume to aggregate and send logs to an S3 bucket.
> >> I did some research, but the last article I found on the topic was
> >> more then a year old and the author abandoned Flume for Kafka. My
> >> other concern is that most of the articles were written for Flume OG,
> >> not NG.
> >> Is there any reason why I should not use flume to sink messages to S3?
> >>
> >>
> >> Thanks in advance.
> >>
> >> Mate Gulyas
> >> Lead Developer at Dmlab
> >
> >
>

Reply via email to