Any thoughts.. Please On Fri, May 10, 2019 at 2:22 AM Chetan Khatri <chetan.opensou...@gmail.com> wrote:
> Hello All, > > I need your help / suggestions, > > I am using Spark 2.3.1 with HDP 2.6.1 Distribution, I will tell my use > case so you get it where people are trying to use Delta. > My use case is I have source as a MSSQL Server (OLTP) and get data at HDFS > currently in Parquet and Avro formats. Now I would like to do Incremental > load / delta load, so I am using CT (Change Tracking Ref. > https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/enable-and-disable-change-tracking-sql-server?view=sql-server-2017) > to get updated and deleted records Primary Key and using that I am only > pulling those records which got updated and deleted. And I would like to > now Update / Delete Data from Parquet. Currently I am doing full load, > which I would like to avoid. > > Could you please suggest me, what is best approach. > > As HDP doesn't have Spark 2.4.2 available so I can't change the > infrastructure, Is there any way to use Delta.io on Spark 2.3.1 as I have > existing codebase written for last year and half in Scala 2.11 which also > I don't want to break with Scala 2.12. > > I don't need versioning, transaction log at parquet. So if anything else > fits to my use case. Please do advise. > > Thank you. >