[ https://issues.apache.org/jira/browse/KAFKA-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171720#comment-14171720 ]
Gwen Shapira commented on KAFKA-1705: ------------------------------------- Looking for community comments on this. Do others see this as a good thing to add? > Add MR layer to Kafka > --------------------- > > Key: KAFKA-1705 > URL: https://issues.apache.org/jira/browse/KAFKA-1705 > Project: Kafka > Issue Type: Improvement > Reporter: Gwen Shapira > Assignee: Gwen Shapira > > Many NoSQL-type storage systems (HBase, Mongo, > Cassandra) and file formats (Avro, Parquet) provide is a MapReduce > integration layer - usually an InputFormat, OutputFormat and a utility > class. Sometimes there's also an abstract Job and Mapper that do more > setup, which can make things even more convenient. > This is different than the existing Hadoop contrib project or Camus in that > an MR layer will be providing components for use in MR jobs, not an entire > job that ingests data from Kafka to HDFS. > The benefits I see for a MapReduce layer are: > * Developers can create their own jobs, processing the data as it is > ingested - rather than having to process it in two steps. > * There's reusable components for developers looking to integrate with > Kafka, rather than having everyone implement their own solution. > * Hadoop developers expect projects to have this layer. > * Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark > integration for free. > * There's a layer to plug the delegation token code into and make it > invisible to MapReduce developers. Without this, everyone who writes > MR jobs will need to think about how to implement authentication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)