Noticed this s3 based consumer project on github https://github.com/razvan/kafka-s3-consumer
On Dec 27, 2012, at 7:08 AM, David Arthur <mum...@gmail.com> wrote: > I don't think anything exists like this in Kafka (or contrib), but it would > be a useful addition! Personally, I have written this exact thing at previous > jobs. > > As for the Hadoop consumer, since there is a FileSystem implementation for S3 > in Hadoop, it should be possible. The Hadoop consumer works by writing out > data files containing the Kafka messages along side offset files which > contain the last offset read for each partition. If it is re-consuming from > zero each time you run it, it means it's not finding the offset files from > the previous run. > > Having used it a bit, the Hadoop consumer is certainly an area that could use > improvement. > > HTH, > David > > On 12/27/12 4:41 AM, Pratyush Chandra wrote: >> Hi, >> >> I am looking for a S3 based consumer, which can write all the received >> events to S3 bucket (say every minute). Something similar to Flume HDFSSink >> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink >> I have tried evaluating hadoop-consumer in contrib folder. But it seems to >> be more for offline processing, which will fetch everything from offset 0 >> at once and replace it in S3 bucket. >> Any help would be appreciated ? >