GitHub user casesolved-co-uk edited a discussion: Pseudo-CDC - polled pipeline runs?
In certain circumstances it may not be desirable to go to the complication of installing Debezium, Kafka and proper CDC. It may be sufficient (e.g. small data) to do pseudo-CDC, i.e. polled pipeline runs, e.g. every minute. Consider this: - Many tables with a common `modified_at` datetime field (assuming this has sufficient resolution to not overlap; could also be an integer, unique primary key, etc as long as it is comparable) - A Hop configuration parameter `synced_to` datetime field - A fetch size - A poll interval Then repeated: SELECT * FROM sometable WHERE modified_at>${synced_to} ORDER BY modified_at ASC LIMIT ${fetch_size} After each run the `synced_to` parameter is updated with the last `modified_at` result retrieved. If len(result) == `fetch_size`, the pipeline is repeated immediately. Else the pipeline is scheduled after `poll interval`. Can Hop do that, maybe with a workflow? GitHub link: https://github.com/apache/hop/discussions/5134 ---- This is an automatically sent email for users@hop.apache.org. To unsubscribe, please send an email to: users-unsubscr...@hop.apache.org