[ 
https://issues.apache.org/jira/browse/KAFKA-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171720#comment-14171720
 ] 

Gwen Shapira commented on KAFKA-1705:
-------------------------------------

Looking for community comments on this. Do others see this as a good thing to 
add?

> Add MR layer to Kafka
> ---------------------
>
>                 Key: KAFKA-1705
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1705
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Gwen Shapira
>            Assignee: Gwen Shapira
>
> Many NoSQL-type storage systems (HBase, Mongo,
> Cassandra) and file formats (Avro, Parquet) provide is a MapReduce
> integration layer - usually an InputFormat, OutputFormat and a utility
> class. Sometimes there's also an abstract Job and Mapper that do more
> setup, which can make things even more convenient.
> This is different than the existing Hadoop contrib project or Camus in that 
> an MR layer will be providing components for use in MR jobs, not an entire 
> job that ingests data from Kafka to HDFS.
> The benefits I see for a MapReduce layer are:
> * Developers can create their own jobs, processing the data as it is
> ingested - rather than having to process it in two steps.
> * There's reusable components for developers looking to integrate with
> Kafka, rather than having everyone implement their own solution.
> * Hadoop developers expect projects to have this layer.
> * Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark
> integration for free.
> * There's a layer to plug the delegation token code into and make it
> invisible to MapReduce developers. Without this, everyone who writes
> MR jobs will need to think about how to implement authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to