This is what storage engines like Delta, Hudi, Iceberg are for. No need to manage it manually or use a DBMS. These formats allow deletes, upserts, etc of data, using Spark, on cloud storage.
On Thu, Jan 27, 2022 at 10:56 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Where ETL data is stored? > > > > *But now the main problem is when the record at the source is deleted, it > should be deleted in my final transformed record too.* > > > If your final sync (storage) is data warehouse, it should be soft flagged > with op_type (Insert/Update/Delete) and op_time (timestamp). > > > > HTH > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 27 Jan 2022 at 15:48, Sid Kal <flinkbyhe...@gmail.com> wrote: > >> I am using Spark incremental approach for bringing the latest data >> everyday. Everything works fine. >> >> But now the main problem is when the record at the source is deleted, it >> should be deleted in my final transformed record too. >> >> How do I capture such changes and change my table too ? >> >> Best regards, >> Sid >> >>