Hi, I am exploring implementing the Hybrid CDC Pattern explained at 29:26 <https://youtu.be/GM7EvRc7_is?si=mIQ5g2k1uEIMX5DT&t=1766> in Ryan Blue's talk CDC patterns in Apache Iceberg <https://trino.io/blog/2023/06/30/trino-fest-2023-apacheiceberg.html>.
The use case is: 1. Stream CDC logs to an append only Iceberg table named *table_changelog* using Flink 2. Periodically MERGE the CDC logs from *table_changelog* to *table* 1. The rate of merge depends on the table's requirements. For some it may be frequently (hourly), for some it may be infrequent (daily). I am considering how to implement (2) using Iceberg's incremental read <https://iceberg.apache.org/docs/latest/spark-queries/#incremental-read> and would appreciate guidance on the following topics: 1. What is the recommendation for storing the latest snapshot ID that is successfully merged into *table*? Ideally this is committed in the same transaction as the MERGE so that reprocessing is minimized. Does Iceberg support storing this as table metadata? I do not see any related information in the Iceberg Table Spec. 2. Use the dataframe API or Spark SQL for the incremental read and MERGE? From the docs, the incremental read examples are using dataframes, and the MERGE uses Spark SQL <https://iceberg.apache.org/docs/latest/spark-writes/#merge-into>. Does either API support both use cases? Thanks, Nick