sbernauer commented on issue #3841: URL: https://github.com/apache/hudi/issues/3841#issuecomment-950682166
I downgraded to Spark 3.1.2, the issue still persists. I will try the `hoodie.avro.schema.validate setting`. This is our config ``` auto.offset.reset: earliest auto.reset.offsets: earliest bootstrap.servers: XXX hoodie.bulkinsert.shuffle.parallelism: 32 hoodie.cleaner.commits.retained: 120 hoodie.cleaner.policy: KEEP_LATEST_COMMITS hoodie.consistency.check.enabled: true hoodie.datasource.hive_sync.database: kiwi hoodie.datasource.hive_sync.enable: true hoodie.datasource.hive_sync.jdbcurl: jdbc:hive2://XXX.svc.cluster.local:10001 hoodie.datasource.hive_sync.partition_fields: happenedDayDe hoodie.datasource.hive_sync.table: sip_prediction_PredictionEvent_v1_any_live_bs hoodie.datasource.write.hive_style_partitioning: false hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.CustomKeyGenerator hoodie.datasource.write.partitionpath.field: header.happenedTimestamp:TIMESTAMP hoodie.datasource.write.reconcile.schema: false hoodie.datasource.write.recordkey.field: header.happenedTimestamp,header.eventId hoodie.delete.shuffle.parallelism: 32 hoodie.deltastreamer.kafka.source.maxEvents: 1000000000000 hoodie.deltastreamer.keygen.timebased.input.timezone: UTC hoodie.deltastreamer.keygen.timebased.output.dateformat: yyyy/MM/dd hoodie.deltastreamer.keygen.timebased.output.timezone: Europe/Berlin hoodie.deltastreamer.keygen.timebased.timestamp.type: EPOCHMILLISECONDS hoodie.deltastreamer.schemaprovider.registry.url: https://XXX/subjects/sip-prediction-PredictionEvent-v1/versions/latest hoodie.deltastreamer.schemaprovider.source.schema.file: /tmp/schema_source.json hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable: true hoodie.deltastreamer.schemaprovider.target.schema.file: /tmp/schema_target.json hoodie.deltastreamer.source.kafka.topic: sip-prediction-PredictionEvent-v1-any.live.bs.mam-event-bus hoodie.deltastreamer.source.kafka.value.deserializer.class: org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer hoodie.insert.shuffle.parallelism: 32 hoodie.keep.max.commits: 1000 hoodie.keep.min.commits: 130 hoodie.metrics.on: true hoodie.metrics.reporter.type: PROMETHEUS hoodie.parquet.compression.codec: snappy hoodie.parquet.max.file.size: 2147483648 hoodie.parquet.small.file.limit: 1073741824 hoodie.payload.event.time.field: header.happenedTimestamp hoodie.upsert.shuffle.parallelism: 32 sasl.mechanism: SCRAM-SHA-512 schema.registry.url: https://XXX security.protocol: sasl_ssl ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
