Hi,

We have daily data pull which pulls almost 50 GB of data from upstream system. 
We are using Spark SQL for processing of 50 GB. Finally insert 50 GB of data 
into Hive Target table and Now we are copying whole hive target table to SQL 
esp. SQL Staging Table & implement merge from staging SQL table against final 
SQL target table and insert only modified or new records in SQL Target table. 
Since this process is time consuming due to majority of time vested in copying 
data from Blob to SQL . Instead of copying whole set of data from cluster to 
SQL Server & implementing merge logic in SQL . We would likes to do Merge logic 
implementation in Spark SQL and Move the same Delta difference to SQL and Merge 
against Final SQL Target Table. This will reduce Network & I/O cost. As any one 
implementing DELTA difference in Spark / SPark SQL

Reply via email to