kbuci commented on code in PR #18295:
URL: https://github.com/apache/hudi/pull/18295#discussion_r2964052662
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -147,14 +150,44 @@ public static HoodieWriteConfig createMetadataWriteConfig(
HoodieTableVersion datatableVersion) {
String tableName = writeConfig.getTableName() + METADATA_TABLE_NAME_SUFFIX;
boolean isStreamingWritesToMetadataEnabled =
writeConfig.isMetadataStreamingWritesEnabled(datatableVersion);
- WriteConcurrencyMode concurrencyMode = isStreamingWritesToMetadataEnabled
- ? WriteConcurrencyMode.NON_BLOCKING_CONCURRENCY_CONTROL :
WriteConcurrencyMode.SINGLE_WRITER;
- HoodieLockConfig lockConfig = isStreamingWritesToMetadataEnabled
- ?
HoodieLockConfig.newBuilder().withLockProvider(InProcessLockProvider.class).build()
: HoodieLockConfig.newBuilder().build();
- // HUDI-9407 tracks adding support for separate lock configuration for
MDT. Until then, all writes to MDT will happen within data table lock.
-
- if (isStreamingWritesToMetadataEnabled) {
+ WriteConcurrencyMode metadataWriteConcurrencyMode =
+
WriteConcurrencyMode.valueOf(writeConfig.getMetadataConfig().getWriteConcurrencyMode());
+
+ WriteConcurrencyMode concurrencyMode;
+ HoodieLockConfig lockConfig;
+ final boolean deriveMetadataLockConfigsFromDataTableConfigs;
+ if (metadataWriteConcurrencyMode.supportsMultiWriter()) {
+ // Configuring Multi-writer directly on metadata table is intended for
executing table service plans, not for writes.
+ checkState(!isStreamingWritesToMetadataEnabled,
+ "Streaming writes to metadata table must be disabled when using
multi-writer concurrency mode "
+ + metadataWriteConcurrencyMode + ". Disable " +
HoodieMetadataConfig.STREAMING_WRITE_ENABLED.key());
+ checkState(metadataWriteConcurrencyMode ==
writeConfig.getWriteConcurrencyMode(),
+ "If multiwriter is used on metadata table, its concurrency mode (" +
metadataWriteConcurrencyMode
+ + ") must match the data table concurrency mode (" +
writeConfig.getWriteConcurrencyMode() + ")");
+ String lockProviderClass = writeConfig.getLockProviderClass();
+ checkState(lockProviderClass != null,
+ "Lock provider class must be set for data table to enable async
executions of table services in metadata table");
+
checkState(!InProcessLockProvider.class.getCanonicalName().equals(lockProviderClass),
+ "InProcessLockProvider cannot be used for metadata table
multi-writer mode as it does not support cross-process locking. "
+ + "Configure a distributed lock provider on the data table.");
+ // First lets create the MDT write config with default single writer
lock configs.
+ // Then, once all MDT-specific write configs are set, we can derive lock
configs
+ // from the data table and re-build the MDT write config with the merged
lock config.
+ concurrencyMode = WriteConcurrencyMode.SINGLE_WRITER;
Review Comment:
Oh that should still be the case, since
https://github.com/apache/hudi/pull/18295/changes#diff-222263167c64d14376f622a020c075f95600cb456409742875925559391226beR161
if metadataWriteConcurrencyMode is set by user to be OCC/NBCC then we anyway
fail if isStreamingWritesToMetadataEnabled is enabled. We don't necessarily
have to implement it this way, but I thought it would make it easier to follow
https://github.com/apache/hudi/pull/18295/#discussion_r2914296574 if we enforce
this. Since iiuc isStreamingWritesToMetadataEnabled is meant for writes/table
services on data table (to allow files in data table/MDT partitions to be
written together) and that anyway should not apply to the use case we want to
support here: to allow concurrent applications to directly execute a table
service plan on MDT
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]