We are encountering the following issue where spark streaming read job from
iceberg table stays stuck after some maintenance jobs (rewrite_data_files
and rewrite_manifests) has been ran on parallel on same table.
https://github.com/apache/iceberg/issues/10117


I'm trying to understand what creates the end `StreamingOffset` from the
latest metadata.json file.
My theory is  `SparkMicroBatchStream.latestOffset()` doesn't seem returning
correct latest offset for table from latest metadata json or
`SparkMicroBatchStream.planFiles()` is not returning any Set when it
encounters `replace` snapshot.

Are these 2 methods right places that determines what iceberg data
files/partitions MicroBatchExcecution will process upon trigger?

Reply via email to