While I agree with Mark that testing the end-to-end pipeline is
critical, note that in terms of performance - whatever you write to
hook-up Teradata to Kafka is unlikely to be as fast as Teradata
connector for Sqoop (especially the newer one). Quite a lot of
optimization by Teradata engineers went
If you use Kafka for the first bulk load, you will test your new
Teradata->Kafka->Hive pipeline, as well as have the ability to blow away
the data in Hive and reflow it from Kafka without an expensive full
re-export from Teradata. As for whether Kafka can handle hundreds of GB of
data: Yes, absolu
Hello,
We are planning to set up a data pipeline and send periodic, incremental
updates from DW to Hadoop via Kafka. For a large DW table with hundreds of GB
of data, is it okay to use Kafka for the initial bulk data load?
Thanks,
Po
Both variants will work well (if your kafka cluster can handle the full
volume of the transmitted data for the duration of the ttl on each topic) .
I would run the whole thing through kafka since you will be "stresstesting"
you production flow - consider if you at some later time lost your
destina
Hello,
We are planning to set up a data pipeline and send periodic, incremental
updates from Teradata to Hadoop via Kafka. For a large DW table with hundreds
of GB of data, is it okay (in terms of performance) to use Kafka for the
initial bulk data load? Or will Sqoop with Teradata connector