GitHub user casesolved-co-uk edited a discussion: Pseudo-CDC - polled pipeline runs?
In certain circumstances it may not be desirable to go to the complication of installing Debezium, Kafka and proper CDC. It may be sufficient to do pseudo-CDC, i.e. polled pipeline runs, e.g. every minute. Consider this: - Many tables with a common `modified_at` datetime field (assuming this has sufficient resolution to not overlap) - A Hop configuration parameter `synced_to` datetime field - A fetch size - A poll interval Then repeated: SELECT * FROM sometable WHERE modified_at>${synced_to} ORDER BY modified_at ASC LIMIT ${fetch_size} After each run the `synced_to` parameter is updated with the last `modified_at` retrieved. If the query is not empty, the pipeline is repeated immediately. If the query is empty, the pipeline is scheduled after `poll interval`. Can Hop do that? GitHub link: https://github.com/apache/hop/discussions/5134 ---- This is an automatically sent email for users@hop.apache.org. To unsubscribe, please send an email to: users-unsubscr...@hop.apache.org