[jira] [Created] (KAFKA-1705) Add MR layer to Kafka

Gwen Shapira (JIRA) Tue, 14 Oct 2014 16:37:51 -0700

Gwen Shapira created KAFKA-1705:
-----------------------------------

             Summary: Add MR layer to Kafka
                 Key: KAFKA-1705
                 URL: https://issues.apache.org/jira/browse/KAFKA-1705
             Project: Kafka
          Issue Type: Improvement
            Reporter: Gwen Shapira
            Assignee: Gwen Shapira



Many NoSQL-type storage systems (HBase, Mongo,
Cassandra) and file formats (Avro, Parquet) provide is a MapReduce
integration layer - usually an InputFormat, OutputFormat and a utility
class. Sometimes there's also an abstract Job and Mapper that do more
setup, which can make things even more convenient.

This is different than the existing Hadoop contrib project or Camus in that an 
MR layer will be providing components for use in MR jobs, not an entire job 
that ingests data from Kafka to HDFS.

The benefits I see for a MapReduce layer are:
* Developers can create their own jobs, processing the data as it is
ingested - rather than having to process it in two steps.
* There's reusable components for developers looking to integrate with
Kafka, rather than having everyone implement their own solution.
* Hadoop developers expect projects to have this layer.
* Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark
integration for free.
* There's a layer to plug the delegation token code into and make it
invisible to MapReduce developers. Without this, everyone who writes
MR jobs will need to think about how to implement authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-1705) Add MR layer to Kafka

Reply via email to