Reading Kafka directly from Pig?

David Arthur Wed, 07 Aug 2013 07:43:33 -0700

I've thrown together a Pig LoadFunc to read data from Kafka, so youcould load data like:

QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' usingcom.mycompany.pig.KafkaAvroLoader('com.mycompany.Query');

The path part of the uri is the Kafka topic, and the fragment is thenumber of partitions. In the implementation I have, it makes one inputsplit per partition. Offsets are not really dealt with at this point -it's a rough prototype.

Anyone have thoughts on whether or not this is a good idea? I knowusually the pattern is: kafka -> hdfs -> mapreduce. If I'm only readingfrom this data from Kafka once, is there any reason why I can't skipwriting to HDFS?


Thanks!
-David

Reading Kafka directly from Pig?

Reply via email to