Re: S3 Consumer

David Arthur Thu, 27 Dec 2012 07:08:41 -0800

I don't think anything exists like this in Kafka (or contrib), but itwould be a useful addition! Personally, I have written this exact thingat previous jobs.

As for the Hadoop consumer, since there is a FileSystem implementationfor S3 in Hadoop, it should be possible. The Hadoop consumer works bywriting out data files containing the Kafka messages along side offsetfiles which contain the last offset read for each partition. If it isre-consuming from zero each time you run it, it means it's not findingthe offset files from the previous run.

Having used it a bit, the Hadoop consumer is certainly an area thatcould use improvement.


HTH,
David

On 12/27/12 4:41 AM, Pratyush Chandra wrote:

Hi,

I am looking for a S3 based consumer, which can write all the received
events to S3 bucket (say every minute). Something similar to Flume HDFSSink
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
I have tried evaluating hadoop-consumer in contrib folder. But it seems to
be more for offline processing, which will fetch everything from offset 0
at once and replace it in S3 bucket.
Any help would be appreciated ?

Re: S3 Consumer

Reply via email to