kbuci commented on code in PR #18295:
URL: https://github.com/apache/hudi/pull/18295#discussion_r2963481781
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -350,6 +383,28 @@ public static HoodieWriteConfig createMetadataWriteConfig(
}
HoodieWriteConfig metadataWriteConfig = builder.build();
+ if (mergeMetdataLockConfigAtEnd) {
+ // We need to update the MDT write config to have the same lock related
configs as the data table.
+ // All data table props with the lock prefix are always copied (to
override MDT defaults with
+ // user-configured values). Other data table props not present in MDT
config are also copied to
+ // support custom lock providers that may use non-standard config keys.
+ Properties lockProps = new Properties();
+ TypedProperties dataTableProps = writeConfig.getProps();
+ TypedProperties mdtProps = metadataWriteConfig.getProps();
+ for (String key : dataTableProps.stringPropertyNames()) {
+ if (key.startsWith(LockConfiguration.LOCK_PREFIX) ||
!mdtProps.containsKey(key)) {
Review Comment:
Oh writers to data table should not be setting
METADATA_WRITE_CONCURRENCY_MODE , this should only be set by a table service
user application that intends to execute compaction plans on the MDT (and does
not hold any table lock while executing the plans) . Example usage would be
```
HoodieBackedTableMetadataWriter metadataTableWriter =
(HoodieBackedTableMetadataWriter)
SparkHoodieBackedTableMetadataWriter.create(
getJavaSparkContext().hadoopConfiguration(),
writeConfig, // User specifies
METADATA_WRITE_CONCURRENCY_MODE
dataTableWriteClient.getEngineContext());
metaClient = metadataTableWriter.getMetadataMetaClient();
writeClient = (SparkRDDWriteClient) metadataTableWriter.getWriteClient();
final List<HoodieInstant> pendingCompactionInstants =
metaClient.getActiveTimeline().filterPendingCompactionTimeline().getInstants();
for (HoodieInstant pendingCompactionInstant : pendingCompactionInstants) {
writeClient.compact(pendingCompactionInstant.getTimestamp());
```
I'm open to making a wrapper API in HUDI lib itself for that. But even if we
do that, I can't see a way around createMetadataWriteConfig , except maybe
creating a new function like
`createMetadataWriteConfigForTableServiceExecution` and then creating a new
static helper function in `HoodieBackedTableMetadataWriter` that uses that API
to execute pending table service plans in MDT?
At the end of the day our core problem is that we want a way to have a
concurrent writer execute pending compaction plans on MDT while making sure
that the data table lock is
- The data table lock is held during the necessary checks (starting
heartbeat, transitioning instant states, committing the compaction, etc)
- . . . but not held the whole time during plan execution (since that will
block ingestion)
So if there's a more ergonomic way to achieve that (other than this PR) then
we should definitely consider it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]