Re: [D] Pseudo-CDC - polled pipeline runs? (hop)

via GitHub Fri, 04 Apr 2025 18:43:39 -0700


GitHub user casesolved-co-uk edited a discussion: Pseudo-CDC - polled pipeline 
runs?


In certain circumstances it may not be desirable to go to the complication of 
installing Debezium, Kafka and proper CDC. It may be sufficient to do 
pseudo-CDC, i.e. polled pipeline runs, e.g. every minute.

Consider this:

- Many tables with a common `modified_at` datetime field (assuming this has 
sufficient resolution to not overlap)
- A Hop configuration parameter `synced_to` datetime field
- A fetch size
- A poll interval

Then repeated:
SELECT * FROM sometable WHERE modified_at>${synced_to} ORDER BY modified_at ASC 
LIMIT ${fetch_size}

After each run the `synced_to` parameter is updated with the last `modified_at` 
retrieved.

If the query is not empty, the pipeline is repeated immediately.
If the query is empty, the pipeline is scheduled after `poll interval`.

Can Hop do that?

GitHub link: https://github.com/apache/hop/discussions/5134

----
This is an automatically sent email for users@hop.apache.org.
To unsubscribe, please send an email to: users-unsubscr...@hop.apache.org

Re: [D] Pseudo-CDC - polled pipeline runs? (hop)

Reply via email to