Re: [Spark-SQL] : Incremental load in Pyspark

Matt Deaver Tue, 11 Apr 2017 12:59:59 -0700

Do you have updates coming in on your data flow? If so, you will need a
staging table and a merge process into your Teradata tables.

If you do not have updated rows aka your Teradata tables are append-only
you can process data and insert (bulk load) into Teradata.

I don't have experience doing this directly in Spark, though, but according
to this post
https://community.hortonworks.com/questions/63826/hi-is-there-any-connector-for-teradata-to-sparkwe.html
you will need to use a JDBC driver to connect.

On Tue, Apr 11, 2017 at 1:23 PM, Vamsi Makkena <kv.makk...@gmail.com> wrote:

> I am reading the data from Oracle tables and Flat files (new excel file
> every week) and write it to Teradata weekly using Pyspark.
>
> In the initial run it will load the all the data to Teradata. But in the
> later runs I just want to read the new records from Oracle and Flatfiles
> and want to append it to teradata tables.
>
> How can I do this using Pyspark, without touching the oracle and teradata
> tables?
>
> Please post the sample code if possible.
>
> Thanks
>

-- 
Regards,

Matt
Data Engineer
https://www.linkedin.com/in/mdeaver
http://mattdeav.pythonanywhere.com/

Re: [Spark-SQL] : Incremental load in Pyspark

Reply via email to