On Jan 7, 2013, at 2:05pm, Russell Jurney wrote: > I previously posted a link to contrib in this thread.
Thanks, I missed that - all I saw was the long URL to the Talend integration doc on Hortonworks. > No, its not a > cascading tap. Its a complete job. One to read kafka events to hdfs, one to > generate kafka events from hdfs. ETL can happen in between. Some Cascading integration notes, just for posterity: Having a Kafka Tap/Scheme would make integration easy. I see there are KafkaInputFormat and KafkaOutputFormat classes in the contrib, which is great - though these would have to back-port these to the older Hadoop APIs in order to work with Cascading. Also Cascading sends all data around as the key (value is always NullWritable) whereas the Kafka input/output formats do the opposite. -- Ken > On Jan 7, 2013 1:51 PM, "Ken Krugler" <kkrugler_li...@transpac.com> wrote: > >> Hi Russell, >> >> On Jan 7, 2013, at 12:48pm, Russell Jurney wrote: >> >>> Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans >>> Hadoop records, which may be ETL'd first, and emits new Kafka events. >> >> Can you point me at the code? >> >> And just to confirm, you're talking about a Cascading Tap, right? >> >> -- Ken >> >>> On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler <kkrugler_li...@transpac.com >>> wrote: >>> >>>> Hi Guy, >>>> >>>> On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: >>>> >>>>> Hi, >>>>> Thanks David, >>>>> >>>>> I am looking for a product (open source or not), something like Talend >>>> or Pentaho that in which I can design the ETL (from and to kafka), and >> run >>>> the the ETL in Storm/ IronCount or even maybe I can run it in Hadoop >>>> Map/Reduce. >>>> >>>> Interesting - we build ETLs on top of Hadoop using Cascading (open >> source >>>> workflow API), which has a lot of what it calls "Taps" for connecting to >>>> data sources and sinks. >>>> >>>> But I haven't heard of a Kafka Tap. Should be possible to implement, >>>> though. >>>> >>>> One issue is that Hadoop is batch oriented, so there's a bit of an >>>> impedance mismatch when you've got a streaming data source, but from >>>> experience it's possible to get that to work. >>>> >>>> -- Ken >>>> >>>>> The product should be complete and supports many connections to many >>>> data sources and targets, In that sense if you know of a connection to >>>> Talend or Pentaho it will be great. >>>>> >>>>> Thanks again. >>>>> , >>>>> >>>>> >>>>> On 01/07/2013 12:28 AM, David Arthur wrote: >>>>>> Storm has support for Kafka, if that's the sort of thing you're >> looking >>>>>> for. Maybe you could describe your use case a bit more? >>>>>> >>>>>> On Sunday, January 6, 2013, Guy Doulberg wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> I am looking for an ETL tool that can connect to kafka, as a consumer >>>> and >>>>>>> as a producer, >>>>>>> >>>>>>> Have you heard of such a tool? >>>>>>> >>>>>>> Thanks >>>>>>> Guy >>>>>>> >>>>>>> >>>>> >>>> >>>> -------------------------- >>>> Ken Krugler >>>> +1 530-210-6378 >>>> http://www.scaleunlimited.com >>>> custom big data solutions & training >>>> Hadoop, Cascading, Cassandra & Solr >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >> datasyndrome.com >> >> -------------------------------------------- >> http://about.me/kkrugler >> +1 530-210-6378 >> >> >> >> >> >> >> -------------------------- >> Ken Krugler >> +1 530-210-6378 >> http://www.scaleunlimited.com >> custom big data solutions & training >> Hadoop, Cascading, Cassandra & Solr >> >> >> >> >> >> -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr