CrazyBeeline opened a new issue #5105: URL: https://github.com/apache/hudi/issues/5105
Steps to reproduce the behavior: 1. extract source from kafka with HoodieDeltaStreamer  insert_cluster.properties main configure hoodie.upsert.shuffle.parallelism=100 hoodie.insert.shuffle.parallelism=100 hoodie.bulkinsert.shuffle.parallelism=100 hoodie.delete.shuffle.parallelism=100 hoodie.rollback.parallelism=100 hoodie.cleaner.parallelism=100 hoodie.datasource.write.recordkey.field=insert_time,id hoodie.datasource.write.partitionpath.field=create_time:TIMESTAMP hoodie.datasource.write.precombine.field=insert_time hoodie.table.base.file.format=PARQUET hoodie.datasource.write.hive_style_partitioning=true hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy/MM/dd hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd ##### memory # on heap hoodie.memory.merge.fraction=0.6 hoodie.memory.merge.max.size=1073741824 hoodie.memory.compaction.fraction=0.6 #hoodie.memory.compaction.max.size= ###### storage ###### hoodie.logfile.data.block.max.size=268435456 hoodie.logfile.max.size=1073741824 hoodie.parquet.max.file.size=125829120 hoodie.parquet.small.file.limit=104857600 #for mor hoodie.logfile.to.parquet.compression.ratio=0.35 ## kafka source hoodie.deltastreamer.source.kafka.topic=hive-kafka-hudi bootstrap.servers=hadoop02:9092,hadoop01:9092,hadoop03:9092 auto.offset.reset=earliest ##### hudi table #### hoodie.database.name=default ##### hive sink ######## hoodie.datasource.hive_sync.database=default hoodie.datasource.hive_sync.table=hudi_person_insert_cluster hoodie.datasource.hive_sync.username=root hoodie.datasource.hive_sync.password= hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hadoop03:10000 hoodie.datasource.hive_sync.partition_fields=create_time hoodie.datasource.hive_sync.use_jdbc=false hoodie.datasource.hive_sync.support_timestamp=false hoodie.datasource.hive_sync.create_managed_table=false hoodie.datasource.hive_sync.sync_as_datasource=true hoodie.datasource.hive_sync.batch_num=10000 hoodie.datasource.hive_sync.assume_date_partitioning=false hoodie.datasource.hive_sync.bucket_sync=false hoodie.datasource.hive_sync.auto_create_database=true hoodie.datasource.hive_sync.enable=true hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.HiveStylePartitionValueExtractor #hoodie.datasource.hive_sync.skip_ro_suffix=true #hoodie.datasource.hive_sync.create_managed_table=false hoodie.embed.timeline.server=true hoodie.deltastreamer.schemaprovider.source.schema.file=file:///opt/software/hudi/schame ######### compaction ####### hoodie.compact.inline=false hoodie.compact.inline.max.delta.commits=10 ###### clean ##### hoodie.clean.automatic=true ## KEEP_LATEST_FILE_VERSIONS or KEEP_LATEST_COMMITS hoodie.cleaner.policy=KEEP_LATEST_COMMITS hoodie.cleaner.delete.bootstrap.base.file=true hoodie.cleaner.commits.retained=3 # lazily for multi-writers hoodie.cleaner.policy.failed.writes=EAGER # for KEEP_LATEST_FILE_VERSIONS hoodie.cleaner.fileversions.retained=3 ##### clustering ####### hoodie.clustering.inline=true hoodie.clustering.inline.max.commits=10 hoodie.clustering.async.enabled=false hoodie.clustering.async.max.commits=3 hoodie.clustering.preserve.commit.metadata=true hoodie.clustering.plan.strategy.target.file.max.bytes=133169152 hoodie.clustering.plan.strategy.small.file.limit=52428800 hoodie.clustering.plan.strategy.sort.columns=insert_time hoodie.clustering.plan.strategy.class=org.apache.hudi.client.clustering.plan.strategy.SparkRecentDaysClusteringPlanStrategy hoodie.clustering.plan.strategy.daybased.lookback.partitions=1 hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions=2 hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy hoodie.clustering.updates.strategy=org.apache.hudi.client.clustering.update.strategy.SparkRejectUpdateStrategy #### Multi Writer ###### #single_writer hoodie.write.concurrency.mode=optimistic_concurrency_control #EAGER hoodie.cleaner.policy.failed.writes=LAZY hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider hoodie.write.lock.zookeeper.url=hadoop01,hadoop02,hadoop03 hoodie.write.lock.zookeeper.port=2181 hoodie.write.lock.zookeeper.lock_key=hive_kafka_hudi hoodie.write.lock.zookeeper.base_path=/hudi_lock hoodie.write.lock.zookeeper.connection_timeout_ms=15000 hoodie.write.lock.zookeeper.session_timeout_ms=60000 #### commit callback#### hoodie.write.commit.callback.on=true hoodie.write.commit.callback.class=org.apache.hudi.utilities.callback.kafka.HoodieWriteCommitKafkaCallback hoodie.write.commit.callback.kafka.bootstrap.servers=hadoop02:9092,hadoop01:9092,hadoop03:9092 hoodie.write.commit.callback.kafka.topic=hudi_commit_callback # #hoodie.write.commit.callback.kafka.partition= hoodie.write.commit.callback.kafka.acks=all hoodie.write.commit.callback.kafka.retries=3 #### metadata #### hoodie.metadata.clean.async=false hoodie.metadata.cleaner.commits.retained=3 hoodie.metadata.compact.max.delta.commits=10 hoodie.metadata.keep.max.commits=30 hoodie.metadata.keep.min.commits=20 hoodie.commits.archival.batch=10 ##### archive hoodie.archive.automatic=true hoodie.archivelog.folder=archived hoodie.archive.delete.parallelism=10 2.I HAVE three partitions  3. Based on the above configuration, I only know that the partition of cluster is create_time=2021-03-08 But actually all partitions will cluster 4. I did a test hoodie.clustering.inline=false partition create_time=2021-03-08 Other partitions are similar  hoodie.clustering.inline=true partition create_time=2021-03-08 have a cluster ops  partition create_time=2021-03-09 also have a cluster ops  partition create_time=2021-03-10 also have a cluster ops  **Environment Description** * Hudi version :0.10.1 * Spark version :3.1.3 * Hive version :3.1.2 * Hadoop version :3.2.2 * Storage (HDFS/S3/GCS..) :HDFS * Running on Docker? (yes/no) :no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org