Sagar Sumit created HUDI-2742:
---------------------------------
Summary: Multiple S3EventsHoodieIncrSource from same S3 metadata
table for different Hudi tables
Key: HUDI-2742
URL: https://issues.apache.org/jira/browse/HUDI-2742
Project: Apache Hudi
Issue Type: Sub-task
Reporter: Sagar Sumit
Use case:
Let's say you have a source bucket which has different folders: a1, a2, a3.
All write events on this bucket are being logged to the single
s3_metadata_table.
Now you want to run 3 S3EventsHoodieIncrSource for each of a1, a2, a3 pulling
metadata from the same s3_metadata_table.
And this should be done ensuring that no two incr sources are ingesting to the
same table i.e. there should be strict separation.
Proposed Solution:
users can provide a filter key value and they can start multiple incr sources
with different configs. In the above use case key could be s3.object.key and
value could be regex that matches upto a certain part of s3 object key. We
apply filter in S3EventsHoodieIncrSource
[here|https://github.com/apache/hudi/blob/6b93ccca9b26b47099e9791d4363e0616e77e408/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java#L105-L109].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)