Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Gwen Shapira
While I agree with Mark that testing the end-to-end pipeline is critical, note that in terms of performance - whatever you write to hook-up Teradata to Kafka is unlikely to be as fast as Teradata connector for Sqoop (especially the newer one). Quite a lot of optimization by Teradata engineers went

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Mark Roberts
If you use Kafka for the first bulk load, you will test your new Teradata->Kafka->Hive pipeline, as well as have the ability to blow away the data in Hive and reflow it from Kafka without an expensive full re-export from Teradata. As for whether Kafka can handle hundreds of GB of data: Yes, absolu

Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Po Cheung
Hello, We are planning to set up a data pipeline and send periodic, incremental updates from DW to Hadoop via Kafka. For a large DW table with hundreds of GB of data, is it okay to use Kafka for the initial bulk data load? Thanks, Po

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread svante karlsson
Both variants will work well (if your kafka cluster can handle the full volume of the transmitted data for the duration of the ttl on each topic) . I would run the whole thing through kafka since you will be "stresstesting" you production flow - consider if you at some later time lost your destina

Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread Po Cheung
Hello, We are planning to set up a data pipeline and send periodic, incremental updates from Teradata to Hadoop via Kafka. For a large DW table with hundreds of GB of data, is it okay (in terms of performance) to use Kafka for the initial bulk data load? Or will Sqoop with Teradata connector