sbernauer commented on issue #3841:
URL: https://github.com/apache/hudi/issues/3841#issuecomment-950682166


   I downgraded to Spark 3.1.2, the issue still persists. I will try the 
`hoodie.avro.schema.validate setting`.
   This is our config
   ```
   auto.offset.reset: earliest
   auto.reset.offsets: earliest
   bootstrap.servers: XXX
   hoodie.bulkinsert.shuffle.parallelism: 32
   hoodie.cleaner.commits.retained: 120
   hoodie.cleaner.policy: KEEP_LATEST_COMMITS
   hoodie.consistency.check.enabled: true
   hoodie.datasource.hive_sync.database: kiwi
   hoodie.datasource.hive_sync.enable: true
   hoodie.datasource.hive_sync.jdbcurl: jdbc:hive2://XXX.svc.cluster.local:10001
   hoodie.datasource.hive_sync.partition_fields: happenedDayDe
   hoodie.datasource.hive_sync.table: 
sip_prediction_PredictionEvent_v1_any_live_bs
   hoodie.datasource.write.hive_style_partitioning: false
   hoodie.datasource.write.keygenerator.class: 
org.apache.hudi.keygen.CustomKeyGenerator
   hoodie.datasource.write.partitionpath.field: 
header.happenedTimestamp:TIMESTAMP
   hoodie.datasource.write.reconcile.schema: false
   hoodie.datasource.write.recordkey.field: 
header.happenedTimestamp,header.eventId
   hoodie.delete.shuffle.parallelism: 32
   hoodie.deltastreamer.kafka.source.maxEvents: 1000000000000
   hoodie.deltastreamer.keygen.timebased.input.timezone: UTC
   hoodie.deltastreamer.keygen.timebased.output.dateformat: yyyy/MM/dd
   hoodie.deltastreamer.keygen.timebased.output.timezone: Europe/Berlin
   hoodie.deltastreamer.keygen.timebased.timestamp.type: EPOCHMILLISECONDS
   hoodie.deltastreamer.schemaprovider.registry.url: 
https://XXX/subjects/sip-prediction-PredictionEvent-v1/versions/latest
   hoodie.deltastreamer.schemaprovider.source.schema.file: 
/tmp/schema_source.json
   hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable: true
   hoodie.deltastreamer.schemaprovider.target.schema.file: 
/tmp/schema_target.json
   hoodie.deltastreamer.source.kafka.topic: 
sip-prediction-PredictionEvent-v1-any.live.bs.mam-event-bus
   hoodie.deltastreamer.source.kafka.value.deserializer.class: 
org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer
   hoodie.insert.shuffle.parallelism: 32
   hoodie.keep.max.commits: 1000
   hoodie.keep.min.commits: 130
   hoodie.metrics.on: true
   hoodie.metrics.reporter.type: PROMETHEUS
   hoodie.parquet.compression.codec: snappy
   hoodie.parquet.max.file.size: 2147483648
   hoodie.parquet.small.file.limit: 1073741824
   hoodie.payload.event.time.field: header.happenedTimestamp
   hoodie.upsert.shuffle.parallelism: 32
   sasl.mechanism: SCRAM-SHA-512
   schema.registry.url: https://XXX
   security.protocol: sasl_ssl
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to