GitHub user bamaer added a comment to the discussion: Pseudo-CDC - polled pipeline runs?
Similar use cases are perfectly doable and afaik widely implemented. Two possible scenarios: *) Table input to get the last updated date/id/whatever from the target table. This date/id should return a single row that is fed into a second table input transform that fetches everything from the source table with a where clause like `where date/id > ?`. The `?` takes the last date/id from the target table input with the `Insert data from transform` option. This wil *) For smaller tables or files: copy the "old" version (last day, last hour) of the data to a separate table. With that old table/file in place, use a Merge Rows Diff transform to compare the old version of the data to the latest version on the date/id. This will give you a flag field for new, identical, updated or deleted rows. That flag field can be used to process using your own logic or with a "Synchronize after merge" transform. If you want to run this very frequently or if there's a lot of data to process, you could add a watchdog pattern, where you write a status file or add a row to a database table. If that status file or row has an `active` status, your workflow can decide to do nothing, or start the syncing pipeline if there's no active process. GitHub link: https://github.com/apache/hop/discussions/5134#discussioncomment-12732670 ---- This is an automatically sent email for users@hop.apache.org. To unsubscribe, please send an email to: users-unsubscr...@hop.apache.org