github-actions[bot] commented on code in PR #61317:
URL: https://github.com/apache/doris/pull/61317#discussion_r2937836214
##########
fe/fe-core/src/main/java/org/apache/doris/job/util/StreamingJobUtils.java:
##########
@@ -355,7 +372,8 @@ public static List<CreateTableCommand>
generateCreateTableCmds(String targetDb,
ImmutableList.of() // clusterKeyColumnNames
);
CreateTableCommand createtblCmd = new
CreateTableCommand(Optional.empty(), createtblInfo);
- createtblCmds.add(createtblCmd);
+ // Key: source (PG/MySQL) table name; Value: command that creates
the Doris target table
+ createtblCmds.put(table, createtblCmd);
}
Review Comment:
[Medium] **Multi-table merge schema safety**: When two source tables map to
the same Doris target table, this `LinkedHashMap` will contain two entries with
different keys but both `CreateTableCommand` values targeting the same Doris
table name. In `createTableIfNotExists()`, the second entry's
`CreateTableCommand` is silently skipped (because the table already exists
after the first entry creates it).
This means the Doris target table is created with **only the first source
table's schema**. If the two source tables have different columns, data from
the second source may fail at stream-load time.
Consider either:
1. Validating that all source tables mapping to the same target have
compatible schemas (at least same column names and compatible types), or
2. Documenting this limitation clearly, or
3. Merging column definitions (union of columns) when building the
`CreateTableCommand` for shared targets.
The current test works because both PG source tables have identical schemas
`(id int, name varchar(200))`.
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/DataSourceConfigValidator.java:
##########
@@ -43,11 +43,33 @@ public class DataSourceConfigValidator {
DataSourceConfigKeys.SSL_ROOTCERT
);
+ // Known suffixes for per-table config keys (format:
"table.<tableName>.<suffix>")
+ private static final Set<String> ALLOW_TABLE_LEVEL_SUFFIXES =
Sets.newHashSet(
+ DataSourceConfigKeys.TABLE_TARGET_TABLE_SUFFIX
+ );
+
+ private static final String TABLE_LEVEL_PREFIX =
DataSourceConfigKeys.TABLE + ".";
+
public static void validateSource(Map<String, String> input) throws
IllegalArgumentException {
for (Map.Entry<String, String> entry : input.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
+ if (key.startsWith(TABLE_LEVEL_PREFIX)) {
+ // per-table config key must be exactly:
table.<tableName>.<suffix>
+ // reject malformed keys like "table.exclude_columns" (missing
tableName)
+ String[] parts = key.split("\\.", -1);
+ if (parts.length != 3 || parts[1].isEmpty()) {
+ throw new IllegalArgumentException("Malformed per-table
config key: '" + key
Review Comment:
[Low] The `split("\\.", -1)` + `parts.length != 3` approach assumes table
names never contain dots. While this is typically true for PG/MySQL, note that
`ConfigUtil.parseAllTargetTableMappings` on the cdc_client side uses a
prefix/suffix substring approach that correctly handles dots in table names.
The two modules use **inconsistent parsing strategies** for the same key format.
This is a minor issue in practice (PG/MySQL identifiers rarely contain
dots), but worth noting as a design inconsistency. Consider aligning both sides
to use the same parsing strategy — the prefix/suffix approach is strictly more
correct.
##########
fe/fe-common/src/main/java/org/apache/doris/job/cdc/DataSourceConfigKeys.java:
##########
@@ -38,6 +38,11 @@ public class DataSourceConfigKeys {
public static final String SSL_MODE = "ssl_mode";
public static final String SSL_ROOTCERT = "ssl_rootcert";
+ // per-table config: key format is "table.<tableName>.<suffix>"
+ public static final String TABLE = "table";
+ public static final String TABLE_EXCLUDE_COLUMNS_SUFFIX =
"exclude_columns";
+ public static final String TABLE_TARGET_TABLE_SUFFIX = "target_table";
Review Comment:
[Medium] `TABLE_EXCLUDE_COLUMNS_SUFFIX` is defined here but is **not** added
to `ALLOW_TABLE_LEVEL_SUFFIXES` in `DataSourceConfigValidator`, meaning any
config key like `table.foo.exclude_columns` would be rejected at validation
time with "Unknown per-table config key". If this is a planned future feature,
consider adding a `// TODO` comment here and removing this constant until it's
actually supported. Shipping an unused public constant that looks usable but
gets rejected at runtime is confusing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]