ksrihari93 commented on issue #5822:
URL: https://github.com/apache/hudi/issues/5822#issuecomment-1155454113

   
   
   
   
   > may be this could be the issue. can you try adding this to spark-submit 
command
   > 
   > ```
   > --hoodie-conf hoodie.clustering.async.enabled=true
   > ```
   
   Hi ,
   I have passed this in the source.properties file 
   
   and 
   
   tried this as well
   hoodie.clustering.plan.strategy.max.bytes.per.group
   
   below props are  passed
   
   hoodie.insert.shuffle.parallelism=50
   hoodie.bulkinsert.shuffle.parallelism=200
   hoodie.embed.timeline.server=true
   hoodie.filesystem.view.type=EMBEDDED_KV_STORE
   hoodie.compact.inline=false
   hoodie.bulkinsert.sort.mode=none
   
   #cleaner properties
   hoodie.cleaner.policy=KEEP_LATEST_FILE_VERSIONS
   hoodie.cleaner.fileversions.retained=60
   hoodie.clean.async=true
   
   #archival
   hoodie.keep.min.commits=12
   hoodie.keep.max.commits=15
   
   #datasource properties
   hoodie.deltastreamer.schemaprovider.registry.url=
   hoodie.datasource.write.recordkey.field=
   hoodie.deltastreamer.source.kafka.topic=
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
   hoodie.datasource.write.partitionpath.field=timestamp:TIMESTAMP
   hoodie.deltastreamer.kafka.source.maxEvents=600000000
   hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   hoodie.deltastreamer.keygen.timebased.input.timezone=UTC
   hoodie.deltastreamer.keygen.timebased.output.timezone=UTC
   hoodie.deltastreamer.keygen.timebased.output.dateformat='dt='yyyy-MM-dd
   hoodie.clustering.async.enabled=true
   hoodie.clustering.plan.strategy.target.file.max.bytes=3000000000
   hoodie.clustering.plan.strategy.small.file.limit=200000001
   hoodie.clustering.async.max.commits=1
   hoodie.clustering.plan.strategy.max.num.groups=10
   oodie.clustering.plan.strategy.max.bytes.per.group=9000000000
   
   #kafka props
   bootstrap.servers=
   group.id=hudi-lpe
   auto.offset.reset=(As is said above when i passed earliest it got failed) 
,so no other choice to recover i have passed latest
   hoodie.deltastreamer.source.kafka.checkpoint.type=timestamp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to