[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar updated HUDI-1896: --------------------------------- Fix Version/s: 1.0.0 > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > ----------------------------------------------------------------- > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: DeltaStreamer > Reporter: Raymond Xu > Assignee: Rajesh Mahindra > Priority: Critical > Labels: hudi-umbrellas, pull-request-available > Fix For: 1.0.0 > > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.1#820001)