Will,

The HDFS connector we ship today is for Kafka -> HDFS, so it isn't
reading/processing data in HDFS.

I was discussing both directions because the question was unclear. However,
there's no reason you couldn't create a connector that processes files in
splits to parallelize an HDFS -> Kafka path, even if it was only for a
single file.

-Ewen

On Tue, Jan 10, 2017 at 5:09 AM, Will Du <will...@gmail.com> wrote:

> In terms of big files which is quite often in HDFS, does connect task
> parallel process the same file like what MR deal with split files? I do not
> think so. In this case, Kafka connect implement has no advantages to read
> single big file unless you also use mapreduce.
>
> Sent from my iPhone
>
> On Jan 10, 2017, at 02:41, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> >> However, I'm trying to figure out if I can use Kafka to read Hadoop
> file.
> >
> > The question is a bit unclear as to whether you mean "use Kafka to send
> > data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka
> > topic". But in both cases, Kafka Connect provides a good option.
> >
> > The more common use case is sending data that you have in Kafka into
> HDFS.
> > In that case,
> > http://docs.confluent.io/3.1.1/connect/connect-hdfs/docs/
> hdfs_connector.html
> > is a good option.
> >
> > If you want the less common case of sending data from HDFS files into a
> > stream of Kafka records, I'm not aware of a connector for doing that yet
> > but it is definitely possible. Kafka Connect takes care of a lot of the
> > details for you so all you have to do is read the file and emit Connect's
> > SourceRecords containing the data from the file. Most other details are
> > handled for you.
> >
> > -Ewen
> >
> >> On Mon, Jan 9, 2017 at 9:18 PM, Sharninder <sharnin...@gmail.com>
> wrote:
> >>
> >> If you want to know if "kafka" can read hadoop files, then no. But you
> can
> >> write your own producer that reads from hdfs any which way and pushes to
> >> kafka. We use kafka as the ingestion pipeline's main queue. Read from
> >> various sources and push everything to kafka.
> >>
> >>
> >> On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz <
> >> cas.apanow...@it-horizon.com
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>> I have general understanding of main Kafka functionality as a streaming
> >>> tool.
> >>> However, I'm trying to figure out if I can use Kafka to read Hadoop
> file.
> >>> Can you please advise?
> >>> Thanks
> >>>
> >>> Cas
> >>>
> >>>
> >>
> >>
> >> --
> >> --
> >> Sharninder
> >>
>

Reply via email to