sivabalan narayanan created HUDI-3264:
-----------------------------------------

             Summary: Make schema registry configs more flexible with 
MultiTableDeltaStreamer
                 Key: HUDI-3264
                 URL: https://issues.apache.org/jira/browse/HUDI-3264
             Project: Apache Hudi
          Issue Type: Task
          Components: deltastreamer
            Reporter: sivabalan narayanan


Ref issue: [https://github.com/apache/hudi/issues/4585]

Hi guys,

we ran into a problem setting the target schema of our Hudi table using the 
MultiTableDeltaStreamer.

Using a normal DeltaStreamer, we are able to set our source and target schemas 
using the properties:
 * hoodie.deltastreamer.schemaprovider.registry.url
 * hoodie.deltastreamer.schemaprovider.registry.targetUrl

We found that we are not able to set these properties on a table basis using 
the MultiTableDeltaStreamer, since the MTDS builds SchemaRegistry URLs for 
target and source schema using the properties:
 * hoodie.deltastreamer.schemaprovider.registry.baseUrl
 * hoodie.deltastreamer.schemaprovider.registry.sourceUrlSuffix
 * hoodie.deltastreamer.schemaprovider.registry.targetUrlSuffix

Later the MultiTableDeltaStreamer uses the source Kafka Topic name also for 
setting the name of the target schema:

 
[hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java|https://github.com/apache/hudi/blob/9fe28e56b49c7bf68ae2d83bfe89755314aa793b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L167]

Line 167 in 
[9fe28e5|https://github.com/apache/hudi/commit/9fe28e56b49c7bf68ae2d83bfe89755314aa793b]
||typedProperties.setProperty(Constants.TARGET_SCHEMA_REGISTRY_URL_PROP, 
schemaRegistryBaseUrl + typedProperties.getString(Constants.KAFKA_TOPIC_PROP) + 
targetSchemaRegistrySuffix);|

 

We think, that schema names should be more configurable, like the origin 
DeltaStreamer would handle it. Actually the names of the schemas you want to 
use for reading or writing the data are very tight coupled to the name of the 
Kafka topic the data is loaded from.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to