Re: S3 Consumer

Chetan Conikee Fri, 28 Dec 2012 22:15:37 -0800

Noticed this s3 based consumer project on github 
https://github.com/razvan/kafka-s3-consumer




On Dec 27, 2012, at 7:08 AM, David Arthur <mum...@gmail.com> wrote:

> I don't think anything exists like this in Kafka (or contrib), but it would 
> be a useful addition! Personally, I have written this exact thing at previous 
> jobs.
> 
> As for the Hadoop consumer, since there is a FileSystem implementation for S3 
> in Hadoop, it should be possible. The Hadoop consumer works by writing out 
> data files containing the Kafka messages along side offset files which 
> contain the last offset read for each partition. If it is re-consuming from 
> zero each time you run it, it means it's not finding the offset files from 
> the previous run.
> 
> Having used it a bit, the Hadoop consumer is certainly an area that could use 
> improvement.
> 
> HTH,
> David
> 
> On 12/27/12 4:41 AM, Pratyush Chandra wrote:
>> Hi,
>> 
>> I am looking for a S3 based consumer, which can write all the received
>> events to S3 bucket (say every minute). Something similar to Flume HDFSSink
>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>> I have tried evaluating hadoop-consumer in contrib folder. But it seems to
>> be more for offline processing, which will fetch everything from offset 0
>> at once and replace it in S3 bucket.
>> Any help would be appreciated ?
>

Re: S3 Consumer

Reply via email to