[
https://issues.apache.org/jira/browse/SQOOP-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088664#comment-14088664
]
Gwen Shapira commented on SQOOP-1414:
-------------------------------------
The planned syntax will be:
sqoop import --connection kafka:broker://broker_host:broker_port --table-name
topic
I currently plan to implement:
First phase:
- No HBase, no Accumulo (A streaming solution makes more sense there)
- Assuming data in Kafka is String
- Single broker in connect string
- Exactly once semantics (using SimpleConsumer, checkpointing reads to HDFS)
- Limited to a single topic per Sqoop job
- Mapper per partition (no user control on number of mappers)
TBD later (possibly only on Sqoop2):
- Avro / Paruqet (probably via Kite)
- Hive / HCat integration
- Pluggable Decoder
- Specify number of mappers
- List of brokers
- List of topics
> Add support for Import from Kafka
> ----------------------------------
>
> Key: SQOOP-1414
> URL: https://issues.apache.org/jira/browse/SQOOP-1414
> Project: Sqoop
> Issue Type: Improvement
> Affects Versions: 1.4.4
> Reporter: Gwen Shapira
> Assignee: Gwen Shapira
>
> Kafka is an important data source for many organizations.
> Support in Sqoop will allow users to easily run MapReduce jobs to read data
> from Kafka topics to HDFS in various formats and to integrate with Hive.
--
This message was sent by Atlassian JIRA
(v6.2#6252)