aidendong created HUDI-4300:
-------------------------------
Summary: Add sync clean and archive for compaction service in
Spark Env
Key: HUDI-4300
URL: https://issues.apache.org/jira/browse/HUDI-4300
Project: Apache Hudi
Issue Type: Improvement
Components: compaction, spark
Reporter: aidendong
Fix For: 0.11.1
The current situation is to provide asynchronous clean and archive in
compaction.
{code:java}
// SparkRDDWriteClient.java
@Override
protected HoodieWriteMetadata<JavaRDD<WriteStatus>> compact(String
compactionInstantTime, boolean shouldComplete) {
HoodieSparkTable<T> table = HoodieSparkTable.create(config, context);
preWrite(compactionInstantTime, WriteOperationType.COMPACT,
table.getMetaClient());
。。。。
} {code}
The asynchronous archive will get distribute lock when
{color:#FF0000}hoodie.write.concurrency.mode=OPTIMISTIC_CONCURRENCY_CONTROL{color}.
*Archive may be locked for a long time*
for example in spark env, In offline scheduleAndCompaction and {color:#172b4d}
hoodie.write.concurrency.mode=OPTIMISTIC_CONCURRENCY_CONTROL。{color}
{color:#172b4d}Maybe all task work on compaction and archive function does not
have enough resources to work when it get lock.{color}
I think, we can provide sync clean and archive for users to choose
--
This message was sent by Atlassian Jira
(v8.20.7#820007)