Gwen Shapira created KAFKA-1705: ----------------------------------- Summary: Add MR layer to Kafka Key: KAFKA-1705 URL: https://issues.apache.org/jira/browse/KAFKA-1705 Project: Kafka Issue Type: Improvement Reporter: Gwen Shapira Assignee: Gwen Shapira
Many NoSQL-type storage systems (HBase, Mongo, Cassandra) and file formats (Avro, Parquet) provide is a MapReduce integration layer - usually an InputFormat, OutputFormat and a utility class. Sometimes there's also an abstract Job and Mapper that do more setup, which can make things even more convenient. This is different than the existing Hadoop contrib project or Camus in that an MR layer will be providing components for use in MR jobs, not an entire job that ingests data from Kafka to HDFS. The benefits I see for a MapReduce layer are: * Developers can create their own jobs, processing the data as it is ingested - rather than having to process it in two steps. * There's reusable components for developers looking to integrate with Kafka, rather than having everyone implement their own solution. * Hadoop developers expect projects to have this layer. * Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark integration for free. * There's a layer to plug the delegation token code into and make it invisible to MapReduce developers. Without this, everyone who writes MR jobs will need to think about how to implement authentication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)