[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-1896: ----------------------------- Issue Type: Epic (was: New Feature) > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > ----------------------------------------------------------------- > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: DeltaStreamer > Reporter: Raymond Xu > Assignee: Rajesh Mahindra > Priority: Critical > Labels: hudi-umbrellas, pull-request-available > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.1#820001)