ksrihari93 commented on issue #5822: URL: https://github.com/apache/hudi/issues/5822#issuecomment-1155454113
> may be this could be the issue. can you try adding this to spark-submit command > > ``` > --hoodie-conf hoodie.clustering.async.enabled=true > ``` Hi , I have passed this in the source.properties file and tried this as well hoodie.clustering.plan.strategy.max.bytes.per.group below props are passed hoodie.insert.shuffle.parallelism=50 hoodie.bulkinsert.shuffle.parallelism=200 hoodie.embed.timeline.server=true hoodie.filesystem.view.type=EMBEDDED_KV_STORE hoodie.compact.inline=false hoodie.bulkinsert.sort.mode=none #cleaner properties hoodie.cleaner.policy=KEEP_LATEST_FILE_VERSIONS hoodie.cleaner.fileversions.retained=60 hoodie.clean.async=true #archival hoodie.keep.min.commits=12 hoodie.keep.max.commits=15 #datasource properties hoodie.deltastreamer.schemaprovider.registry.url= hoodie.datasource.write.recordkey.field= hoodie.deltastreamer.source.kafka.topic= hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator hoodie.datasource.write.partitionpath.field=timestamp:TIMESTAMP hoodie.deltastreamer.kafka.source.maxEvents=600000000 hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS hoodie.deltastreamer.keygen.timebased.input.timezone=UTC hoodie.deltastreamer.keygen.timebased.output.timezone=UTC hoodie.deltastreamer.keygen.timebased.output.dateformat='dt='yyyy-MM-dd hoodie.clustering.async.enabled=true hoodie.clustering.plan.strategy.target.file.max.bytes=3000000000 hoodie.clustering.plan.strategy.small.file.limit=200000001 hoodie.clustering.async.max.commits=1 hoodie.clustering.plan.strategy.max.num.groups=10 oodie.clustering.plan.strategy.max.bytes.per.group=9000000000 #kafka props bootstrap.servers= group.id=hudi-lpe auto.offset.reset=(As is said above when i passed earliest it got failed) ,so no other choice to recover i have passed latest hoodie.deltastreamer.source.kafka.checkpoint.type=timestamp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
