Re: Using Kafka for ETL from DW to Hadoop

Gwen Shapira Thu, 23 Oct 2014 10:04:20 -0700

While I agree with Mark that testing the end-to-end pipeline is
critical, note that in terms of performance - whatever you write to
hook-up Teradata to Kafka is unlikely to be as fast as Teradata
connector for Sqoop (especially the newer one). Quite a lot of
optimization by Teradata engineers went into the connector.


Actually, unless you need very low latency (seconds to few minutes),
or consumers other than Hadoop, I'd go with Sqoop incremental jobs and
leave Kafka out of the equation completely. This will save you quite a
bit of work on connecting Teradata to Kafka, if it fits your user
case.

Gwen

On Thu, Oct 23, 2014 at 9:48 AM, Mark Roberts <wiz...@gmail.com> wrote:
> If you use Kafka for the first bulk load, you will test your new
> Teradata->Kafka->Hive pipeline, as well as have the ability to blow away
> the data in Hive and reflow it from Kafka without an expensive full
> re-export from Teradata.  As for whether Kafka can handle hundreds of GB of
> data: Yes, absolutely.
>
> -Mark
>
>
> On Thu, Oct 23, 2014 at 3:08 AM, Po Cheung <poche...@yahoo.com.invalid>
> wrote:
>
>> Hello,
>>
>> We are planning to set up a data pipeline and send periodic, incremental
>> updates from Teradata to Hadoop via Kafka.  For a large DW table with
>> hundreds of GB of data, is it okay (in terms of performance) to use Kafka
>> for the initial bulk data load?  Or will Sqoop with Teradata connector be
>> more appropriate?
>>
>>
>> Thanks,
>> Po

Re: Using Kafka for ETL from DW to Hadoop

Reply via email to