Do you have updates coming in on your data flow? If so, you will need a staging table and a merge process into your Teradata tables.
If you do not have updated rows aka your Teradata tables are append-only you can process data and insert (bulk load) into Teradata. I don't have experience doing this directly in Spark, though, but according to this post https://community.hortonworks.com/questions/63826/hi-is-there-any-connector-for-teradata-to-sparkwe.html you will need to use a JDBC driver to connect. On Tue, Apr 11, 2017 at 1:23 PM, Vamsi Makkena <kv.makk...@gmail.com> wrote: > I am reading the data from Oracle tables and Flat files (new excel file > every week) and write it to Teradata weekly using Pyspark. > > In the initial run it will load the all the data to Teradata. But in the > later runs I just want to read the new records from Oracle and Flatfiles > and want to append it to teradata tables. > > How can I do this using Pyspark, without touching the oracle and teradata > tables? > > Please post the sample code if possible. > > Thanks > -- Regards, Matt Data Engineer https://www.linkedin.com/in/mdeaver http://mattdeav.pythonanywhere.com/