suryaprasanna opened a new issue, #18366: URL: https://github.com/apache/hudi/issues/18366
### Feature Description **What the feature achieves:** Add support in HoodieIncrSource to return complete latest-state rows for modified records across the requested commit range, or expose an option that enables this behavior. **Why this feature is needed:** HoodieIncrSource currently reads upstream Hudi tables through the normal incremental reader path. Apache Hudi supports incremental read formats such as latest_state and cdc at the datasource level, but HoodieStreamer/HoodieIncrSource does not provide a way to return a complete latest-state view for modified records when sparse updates span multiple commits. This becomes a practical problem when a source table changes from COW to MOR. With COW, incremental reads effectively provide the latest state of changed records, but with MOR the same downstream pipeline may only receive changes within the incremental commit window and may miss values from earlier commits for sparse updates. That means downstream or target datasets need extra merge logic after the table-type change, which is especially difficult to roll out in large data lakes.` ### User Experience **How users will use this feature:** - Configuration changes needed - API changes - Usage examples ### Hudi RFC Requirements **RFC PR link:** (if applicable) **Why RFC is/isn't needed:** - Does this change public interfaces/APIs? (Yes/No) - Does this change storage format? (Yes/No) - Justification: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
