GitHub user casesolved-co-uk added a comment to the discussion: Pseudo-CDC - 
polled pipeline runs?

> *) Table input to get the last updated date/id/whatever from the target 
> table. This date/id should return a single row that is fed into a second 
> table input transform that fetches everything from the source table with a 
> where clause like `where date/id > ?`. The `?` takes the last date/id from 
> the target table input with the `Insert data from transform` option. This wil
> 
> *) For smaller tables or files: copy the "old" version (last day, last hour) 
> of the data to a separate table. With that old table/file in place, use a 
> Merge Rows Diff transform to compare the old version of the data to the 
> latest version on the date/id. This will give you a flag field for new, 
> identical, updated or deleted rows. That flag field can be used to process 
> using your own logic or with a "Synchronize after merge" transform.
> 
> If you want to run this very frequently or if there's a lot of data to 
> process, you could add a watchdog pattern, where you write a status file or 
> add a row to a database table. If that status file or row has an `active` 
> status, your workflow can decide to do nothing, or start the syncing pipeline 
> if there's no active process.

Thanks for the input @bamaer.

Is there a transform that saves the date/id/whatever to a configuration value 
in hop itself? Your solution seems to imply the value must be written out as 
'data', i.e. to a database table or a data file. The reason is it would be 
common in testing or production to want to change the 'start date' of the sync, 
so would be better as configuration.

I'm surprised there is no transform built in to hop to help with this scenario. 
It must be quite a common scenario to want to keep a source and target in sync 
like this. Are there any plans for such a 'Monitor Input Field' feature?

GitHub link: 
https://github.com/apache/hop/discussions/5134#discussioncomment-12764292

----
This is an automatically sent email for users@hop.apache.org.
To unsubscribe, please send an email to: users-unsubscr...@hop.apache.org

Reply via email to