Sid, How do you cater for updates? Do you add it as an update with a new record without touching the original record? This approach allows you to see the history of the records i.e. inserted once, deleted once and updated *n* times throughout the Entity Life History of record.
So your mileage varies depending on what you want to do, maintaining the history of the event (An event is a change in state, or an update of some key business system or row in your case). An event-driven architecture (EDA) consists of an event producer (your source), an event consumer (your final storage, a DB or storage bucket), and an event broker (your Spark engine) HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 27 Jan 2022 at 16:59, Sid Kal <flinkbyhe...@gmail.com> wrote: > Hi Mich, > > Thanks for your time. > > Data is stored in S3 via DMS which is read in the Spark jobs. > > How can I mark as a soft delete ? > > Any small snippet / link / example. Anything would help. > > Thanks, > Sid > > On Thu, 27 Jan 2022, 22:26 Mich Talebzadeh, <mich.talebza...@gmail.com> > wrote: > >> Where ETL data is stored? >> >> >> >> *But now the main problem is when the record at the source is deleted, it >> should be deleted in my final transformed record too.* >> >> >> If your final sync (storage) is data warehouse, it should be soft flagged >> with op_type (Insert/Update/Delete) and op_time (timestamp). >> >> >> >> HTH >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Thu, 27 Jan 2022 at 15:48, Sid Kal <flinkbyhe...@gmail.com> wrote: >> >>> I am using Spark incremental approach for bringing the latest data >>> everyday. Everything works fine. >>> >>> But now the main problem is when the record at the source is deleted, it >>> should be deleted in my final transformed record too. >>> >>> How do I capture such changes and change my table too ? >>> >>> Best regards, >>> Sid >>> >>>