watermelon12138 commented on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1055454800


   > One high level question, is there any limitation or pre requisite in terms 
of different schemas from different source topics that users should be aware of 
when using this new feature? What happens when one source schema has few fields 
which are not present in second source schema and so on. Does that lead to data 
loss? How are we handling that?
   @pratyakshsharma 
   Good question! This new feature is designed to support the service scenario 
in which multiple sources are injected to one sink table. When the schemas of 
multiple sources are inconsistent, we must configure the independent schema and 
transformer for each source to convert the schema of source to the schema of 
sink table so that source data can be written to the sink table. For example, 
We can configure hoodie.deltastreamer.schemaprovider.source.schema.file or 
hoodie.deltastreamer.source.schemaProviderClassName to specify the schema of 
each source, and then configure 
hoodie.deltastreamer.source.transformerClassNames or 
hoodie.deltastreamer.transformer.sql to convert the schema of source to the 
schema of sink table. I highly recommend the 
hoodie.deltastreamer.transformer.sql configuration, which can associate source 
data with Hive tables, for example, join. The preceding method is helpful for 
resolving schema inconsistency issues. I look forward to hearing from you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to