Will, The HDFS connector we ship today is for Kafka -> HDFS, so it isn't reading/processing data in HDFS.
I was discussing both directions because the question was unclear. However, there's no reason you couldn't create a connector that processes files in splits to parallelize an HDFS -> Kafka path, even if it was only for a single file. -Ewen On Tue, Jan 10, 2017 at 5:09 AM, Will Du <will...@gmail.com> wrote: > In terms of big files which is quite often in HDFS, does connect task > parallel process the same file like what MR deal with split files? I do not > think so. In this case, Kafka connect implement has no advantages to read > single big file unless you also use mapreduce. > > Sent from my iPhone > > On Jan 10, 2017, at 02:41, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > >> However, I'm trying to figure out if I can use Kafka to read Hadoop > file. > > > > The question is a bit unclear as to whether you mean "use Kafka to send > > data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka > > topic". But in both cases, Kafka Connect provides a good option. > > > > The more common use case is sending data that you have in Kafka into > HDFS. > > In that case, > > http://docs.confluent.io/3.1.1/connect/connect-hdfs/docs/ > hdfs_connector.html > > is a good option. > > > > If you want the less common case of sending data from HDFS files into a > > stream of Kafka records, I'm not aware of a connector for doing that yet > > but it is definitely possible. Kafka Connect takes care of a lot of the > > details for you so all you have to do is read the file and emit Connect's > > SourceRecords containing the data from the file. Most other details are > > handled for you. > > > > -Ewen > > > >> On Mon, Jan 9, 2017 at 9:18 PM, Sharninder <sharnin...@gmail.com> > wrote: > >> > >> If you want to know if "kafka" can read hadoop files, then no. But you > can > >> write your own producer that reads from hdfs any which way and pushes to > >> kafka. We use kafka as the ingestion pipeline's main queue. Read from > >> various sources and push everything to kafka. > >> > >> > >> On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz < > >> cas.apanow...@it-horizon.com > >>> wrote: > >> > >>> Hi, > >>> > >>> I have general understanding of main Kafka functionality as a streaming > >>> tool. > >>> However, I'm trying to figure out if I can use Kafka to read Hadoop > file. > >>> Can you please advise? > >>> Thanks > >>> > >>> Cas > >>> > >>> > >> > >> > >> -- > >> -- > >> Sharninder > >> >