Hisoka-X commented on code in PR #9380:
URL: https://github.com/apache/seatunnel/pull/9380#discussion_r2148937686
##########
docs/en/connector-v2/source/Jdbc.md:
##########
@@ -349,13 +453,23 @@ Jdbc {
table_list = [
{
+ # Exact table path
# e.g. table_path = "testdb.table1"、table_path =
"test_schema.table1"、table_path = "testdb.test_schema.table1"
table_path = "testdb.table1"
},
{
table_path = "testdb.table2"
- # Use query filetr rows & columns
+ # Use query filter rows & columns
query = "select id, name from testdb.table2 where id > 100"
+ },
+ {
+ # Using regex to match multiple tables
+ table_path = "testdb.user_table\\d+"
+ use_regex = true
+ },
+ {
+ # Auto-detection of regex pattern
+ table_path = "testdb.log_.*"
Review Comment:
we doesn't need support this feature.
##########
docs/en/connector-v2/source/Jdbc.md:
##########
@@ -63,7 +63,7 @@ supports query SQL and can achieve projection effect.
| fetch_size | Int | No | 0
| For queries that return a large number of objects, you can configure the
row fetch size used in the query to improve performance by reducing the number
database hits required to satisfy the selection criteria. Zero means use jdbc
default value.
|
| properties | Map | No | -
| Additional connection configuration parameters,when properties and URL
have the same parameters, the priority is determined by the <br/>specific
implementation of the driver. For example, in MySQL, properties take precedence
over the URL.
|
Review Comment:
Please add `use_regex` into main connector options table.
##########
docs/en/connector-v2/source/Jdbc.md:
##########
@@ -63,7 +63,7 @@ supports query SQL and can achieve projection effect.
| fetch_size | Int | No | 0
| For queries that return a large number of objects, you can configure the
row fetch size used in the query to improve performance by reducing the number
database hits required to satisfy the selection criteria. Zero means use jdbc
default value.
|
| properties | Map | No | -
| Additional connection configuration parameters,when properties and URL
have the same parameters, the priority is determined by the <br/>specific
implementation of the driver. For example, in MySQL, properties take precedence
over the URL.
|
| table_path | String | No | -
| The path to the full path of table, you can use this configuration
instead of `query`. <br/>examples: <br/>`- mysql: "testdb.table1" `<br/>`-
oracle: "test_schema.table1" `<br/>`- sqlserver: "testdb.test_schema.table1"`
<br/>`- postgresql: "testdb.test_schema.table1"` <br/>`- iris:
"test_schema.table1"`
|
-| table_list | Array | No | -
| The list of tables to be read, you can use this configuration instead of
`table_path`
|
+| table_list | Array | No | -
| The list of tables to be read, you can use this configuration instead
of `table_path` example: ```[{ table_path = "testdb.table1"}, {table_path =
"testdb.table2", query = "select * id, name from testdb.table2"}]```<br/>,and
supports using regular expressions in `table_path`, for example: ```[{
table_path = "testdb.table\\d+", "use_regex" = "true"}]``` will match all
tables that start with "table" followed by numbers.
|
Review Comment:
Please revert it. In table_list, not only can configure query, table_path,
use_regex. So we don't need to specify which features are supported, which
would lead to misconceptions.
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java:
##########
@@ -237,7 +245,9 @@ public Long approximateRowCntStatement(Connection
connection, JdbcSourceTable ta
String.format(
"analyze table %s compute statistics for
table",
tableIdentifier(tablePath));
- if (!table.getSkipAnalyze()) {
+ // Add null value judgment
+ Boolean skipAnalyze = table.getSkipAnalyze();
Review Comment:
ditto
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java:
##########
@@ -219,7 +225,9 @@ public Long approximateRowCntStatement(Connection
connection, JdbcSourceTable ta
.getFullName()
.equals(table.getTablePath().getFullName()));
- if (table.getUseSelectCount()) {
+ // Add null value judgment
+ Boolean useSelectCount = table.getUseSelectCount();
Review Comment:
ditto
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:
##########
@@ -409,4 +428,102 @@ private static ReadonlyConfig
extractCatalogConfig(JdbcConnectionConfig config)
catalogConfig.put(JdbcOptions.HANDLE_BLOB_AS_STRING.key(),
config.isHandleBlobAsString());
return ReadonlyConfig.fromMap(catalogConfig);
}
+
+ /** Process table path with regex pattern and add matched tables to the
result. */
+ private static void processRegexTablePath(
+ AbstractJdbcCatalog jdbcCatalog,
+ JdbcDialect jdbcDialect,
+ JdbcSourceTableConfig tableConfig,
+ Map<TablePath, JdbcSourceTable> result)
+ throws SQLException {
+
+ String tablePath = tableConfig.getTablePath();
+ log.info("Processing table path with regex: {}", tablePath);
+
+ // Parse table path to extract database, schema and table patterns
Review Comment:
could you remove these useless comment?
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:
##########
@@ -409,4 +428,102 @@ private static ReadonlyConfig
extractCatalogConfig(JdbcConnectionConfig config)
catalogConfig.put(JdbcOptions.HANDLE_BLOB_AS_STRING.key(),
config.isHandleBlobAsString());
return ReadonlyConfig.fromMap(catalogConfig);
}
+
+ /** Process table path with regex pattern and add matched tables to the
result. */
+ private static void processRegexTablePath(
+ AbstractJdbcCatalog jdbcCatalog,
+ JdbcDialect jdbcDialect,
+ JdbcSourceTableConfig tableConfig,
+ Map<TablePath, JdbcSourceTable> result)
+ throws SQLException {
+
+ String tablePath = tableConfig.getTablePath();
+ log.info("Processing table path with regex: {}", tablePath);
+
+ // Parse table path to extract database, schema and table patterns
+ String databasePattern = ".*";
+ String tableNamePattern = ".*"; // This is just the table name part
+
+ // Convert the table path to a format suitable for regex matching
+
+ // Step 1: Replace escaped dots (\.) with a placeholder
+ String processedTablePath = tablePath.replace("\\.", DOT_PLACEHOLDER);
+ log.info("After replacing escaped dots with placeholder: {}",
processedTablePath);
+
+ // Step 2: Split by unescaped dots
+ String[] parts = processedTablePath.split("\\.");
+
+ String fullTablePattern;
+
+ if (parts.length == 1) {
+ // Only table pattern
+ tableNamePattern = parts[0];
+ fullTablePattern = String.format("%s", tableNamePattern);
+ } else if (parts.length == 2) {
+ // database.table or schema.table format
+ databasePattern = parts[0];
+ tableNamePattern = parts[1];
+ fullTablePattern = String.format("%s.%s", databasePattern,
tableNamePattern);
+ } else if (parts.length >= 3) {
+ // database.schema.table format
+ databasePattern = parts[0];
+ String schemaPattern = parts[1];
+ tableNamePattern = parts[2];
+ fullTablePattern =
+ String.format(
+ "%s.%s.%s",
+ databasePattern.replace(DOT_PLACEHOLDER, "."),
+ schemaPattern.replace(DOT_PLACEHOLDER, "."),
+ tableNamePattern.replace(DOT_PLACEHOLDER, "."));
+ } else {
+ fullTablePattern = ".*..*..*";
Review Comment:
```suggestion
fullTablePattern = ".*";
```
Can this work properly?
##########
docs/en/connector-v2/source/Jdbc.md:
##########
@@ -74,6 +74,119 @@ supports query SQL and can achieve projection effect.
| split.string_split_mode | String | No | sample
| Supports different string splitting algorithms. By default, `sample` is
used to determine the split by sampling the string value. You can switch to
`charset_based` to enable charset-based string splitting algorithm. When set to
`charset_based`, the algorithm assumes characters of partition_column are
within ASCII range 32-126, which covers most character-based splitting
scenarios.
|
| split.string_split_mode_collate | String | No | -
| Specifies the collation to use when string_split_mode is set to
`charset_based` and the table has a special collation. If not specified, the
database's default collation will be used.
|
+### Table Matching
+
+The JDBC Source connector supports two ways to specify tables:
+
+1. **Exact Table Path**: Use `table_path` to specify a single table with its
full path.
+ ```hocon
+ table_path = "testdb.table1"
+ ```
+
+2. **Regular Expression**: Use `table_path` with a regex pattern to match
multiple tables.
+ ```hocon
+ table_path = "testdb.table\\d+" # Matches table1, table2, table3, etc.
+ use_regex = true
+ ```
+
+#### Regular Expression Support for Table Names
+
+The JDBC connector supports using regular expressions to match multiple
tables. This feature allows you to process multiple tables with a single source
configuration.
+
+### Configuration
+
+To use regular expression matching for table paths:
+
+1. Set `use_regex = true` to enable regex matching
+2. If `use_regex` is not set or set to `false`, the connector will treat the
table_path as an exact path (no regex matching)
+
+### Parameters
+
+| Name | Type | Required | Default | Description |
+|------|------|----------|---------|-------------|
+| use_regex | Boolean | No | false | Control regular expression matching for
table_path. When set to `true`, the table_path will be treated as a regular
expression pattern. When set to `false` or not specified, the table_path will
be treated as an exact path (no regex matching). |
+
+### Regular Expression Syntax Notes
+
+- **Path Separator**: The dot (`.`) is treated as a separator between
database, schema, and table names.
+- **Escaped Dots**: If you need to use a dot (`.`) as a wildcard character in
your regular expression to match any character, you must escape it with a
backslash (`\.`).
+- **Path Format**: For paths like `database.table` or `database.schema.table`,
the last unescaped dot separates the table pattern from the database/schema
pattern.
+- **Pattern Examples**:
+ - `test.table\\d+` - Matches tables like `table1`, `table2`, etc. in the
`test` database
+ - `test\\.table\\d+` - Matches tables with literal dots in names like
`test.table1`, `test.table2`
+ - `.*\\.user_.*` - Matches any database with tables starting with `user_`
+
+### Example
Review Comment:
@yzeng1618
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:
##########
@@ -409,4 +428,102 @@ private static ReadonlyConfig
extractCatalogConfig(JdbcConnectionConfig config)
catalogConfig.put(JdbcOptions.HANDLE_BLOB_AS_STRING.key(),
config.isHandleBlobAsString());
return ReadonlyConfig.fromMap(catalogConfig);
}
+
+ /** Process table path with regex pattern and add matched tables to the
result. */
+ private static void processRegexTablePath(
+ AbstractJdbcCatalog jdbcCatalog,
+ JdbcDialect jdbcDialect,
+ JdbcSourceTableConfig tableConfig,
+ Map<TablePath, JdbcSourceTable> result)
+ throws SQLException {
+
+ String tablePath = tableConfig.getTablePath();
+ log.info("Processing table path with regex: {}", tablePath);
+
+ // Parse table path to extract database, schema and table patterns
+ String databasePattern = ".*";
+ String tableNamePattern = ".*"; // This is just the table name part
+
+ // Convert the table path to a format suitable for regex matching
+
+ // Step 1: Replace escaped dots (\.) with a placeholder
+ String processedTablePath = tablePath.replace("\\.", DOT_PLACEHOLDER);
+ log.info("After replacing escaped dots with placeholder: {}",
processedTablePath);
+
+ // Step 2: Split by unescaped dots
+ String[] parts = processedTablePath.split("\\.");
Review Comment:
In `processedTablePath`, the real dot already be replaced by
`DOT_PLACEHOLDER`. Why it still can be splited in here?
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java:
##########
@@ -211,6 +211,12 @@ public Long approximateRowCntStatement(Connection
connection, JdbcSourceTable ta
String query = table.getQuery();
+ // Add null value judgment
+ if (table.getTablePath() == null) {
Review Comment:
Under what circumstances will tablepath be null?
##########
docs/zh/connector-v2/source/PostgreSQL.md:
##########
@@ -91,7 +91,7 @@ import ChangeLog from '../changelog/connector-jdbc.md';
| fetch_size | Int | 否 | 0
| 对于返回大量对象的查询,您可以配置<br/> 用于查询的行抓取大小,以通过减少所需的数据库访问次数来提高性能。<br/> 0 表示使用
JDBC 默认值。
|
| properties | Map | 否 | -
| 其他连接配置参数,当属性和 URL 具有相同参数时,<br/> 优先级由驱动程序的具体实现决定。在 MySQL 中,属性优先于 URL。
|
| table_path | String | 否 | -
| 表的完整路径,您可以使用此配置替代 `query`。<br/> 示例:<br/> mysql: "testdb.table1" <br/>
oracle: "test_schema.table1" <br/> sqlserver: "testdb.test_schema.table1" <br/>
postgresql: "testdb.test_schema.table1"
|
-| table_list | Array | 否 | -
| 要读取的表列表,您可以使用此配置替代 `table_path` 示例:```[{ table_path =
"testdb.table1"}, {table_path = "testdb.table2", query = "select * id, name
from testdb.table2"}]```
|
+| table_list | Array | 否 | -
| 要读取的表列表,您可以使用此配置代替 `table_path`,示例如下: ```[{ table_path =
"testdb.test_schema.table1"}, {table_path = "testdb.test_schema.table2", query
= "select * id, name from testdb.test_schema.table2"}]```<br/>,并且支持在
`table_path` 中使用正则表达式,例如:```[{ table_path = "testdb.test_schema.table\\d+"}```
将匹配所有以 "table" 开头后跟数字的表。
|
Review Comment:
ditto
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:
##########
@@ -409,4 +428,102 @@ private static ReadonlyConfig
extractCatalogConfig(JdbcConnectionConfig config)
catalogConfig.put(JdbcOptions.HANDLE_BLOB_AS_STRING.key(),
config.isHandleBlobAsString());
return ReadonlyConfig.fromMap(catalogConfig);
}
+
+ /** Process table path with regex pattern and add matched tables to the
result. */
+ private static void processRegexTablePath(
+ AbstractJdbcCatalog jdbcCatalog,
+ JdbcDialect jdbcDialect,
+ JdbcSourceTableConfig tableConfig,
+ Map<TablePath, JdbcSourceTable> result)
+ throws SQLException {
+
+ String tablePath = tableConfig.getTablePath();
+ log.info("Processing table path with regex: {}", tablePath);
+
+ // Parse table path to extract database, schema and table patterns
+ String databasePattern = ".*";
+ String tableNamePattern = ".*"; // This is just the table name part
+
+ // Convert the table path to a format suitable for regex matching
+
+ // Step 1: Replace escaped dots (\.) with a placeholder
+ String processedTablePath = tablePath.replace("\\.", DOT_PLACEHOLDER);
+ log.info("After replacing escaped dots with placeholder: {}",
processedTablePath);
+
+ // Step 2: Split by unescaped dots
+ String[] parts = processedTablePath.split("\\.");
+
+ String fullTablePattern;
+
+ if (parts.length == 1) {
+ // Only table pattern
+ tableNamePattern = parts[0];
+ fullTablePattern = String.format("%s", tableNamePattern);
Review Comment:
```suggestion
fullTablePattern = tableNamePattern;
```
##########
docs/en/connector-v2/source/PostgreSQL.md:
##########
@@ -91,7 +91,7 @@ Read external data source data through JDBC.
| fetch_size | Int | No | 0
| For queries that return a large number of objects,you can
configure<br/> the row fetch size used in the query toimprove performance
by<br/> reducing the number database hits required to satisfy the selection
criteria.<br/> Zero means use jdbc default value.
|
| properties | Map | No | -
| Additional connection configuration parameters,when properties and
URL have the same parameters, the priority is determined by the <br/>specific
implementation of the driver. For example, in MySQL, properties take precedence
over the URL.
|
| table_path | String | No | -
| The path to the full path of table, you can use this configuration
instead of `query`. <br/>examples: <br/>mysql: "testdb.table1" <br/>oracle:
"test_schema.table1" <br/>sqlserver: "testdb.test_schema.table1"
<br/>postgresql: "testdb.test_schema.table1"
|
-| table_list | Array | No | -
| The list of tables to be read, you can use this configuration instead
of `table_path` example: ```[{ table_path = "testdb.table1"}, {table_path =
"testdb.table2", query = "select * id, name from testdb.table2"}]```
|
+| table_list | Array | No | -
| The list of tables to be read, you can use this configuration instead
of `table_path` example: ```[{ table_path = "testdb.test_schema.table1"},
{table_path = "testdb.test_schema.table2", query = "select * id, name from
testdb.test_schema.table2"}]```<br/>,and supports using regular expressions in
`table_path`, for example: ```[{ table_path = "testdb.table\\d+", "use_regex" =
"true"}]``` will match all tables that start with "table" followed by numbers.
|
Review Comment:
ditto.
##########
docs/en/connector-v2/source/Oracle.md:
##########
@@ -77,7 +77,7 @@ Read external data source data through JDBC.
| fetch_size | Int | No | 0 | For
queries that return a large number of objects,you can configure<br/> the row
fetch size used in the query toimprove performance by<br/> reducing the number
database hits required to satisfy the selection criteria.<br/> Zero means use
jdbc default value. |
| properties | Map | No | - |
Additional connection configuration parameters,when properties and URL have the
same parameters, the priority is determined by the <br/>specific implementation
of the driver. For example, in Oracle, properties take precedence over the URL.
|
| table_path | String | No | -
| The path to the full path of table, you can use this configuration
instead of `query`. <br/>examples: <br/>mysql: "testdb.table1" <br/>oracle:
"test_schema.table1" <br/>sqlserver: "testdb.test_schema.table1"
<br/>postgresql: "testdb.test_schema.table1"
|
-| table_list | Array | No | -
| The list of tables to be read, you can use this configuration instead
of `table_path` example: ```[{ table_path = "testdb.table1"}, {table_path =
"testdb.table2", query = "select * id, name from testdb.table2"}]```
|
+| table_list | Array | No | -
| The list of tables to be read, you can use this configuration instead
of `table_path` example: ```[{ table_path = "testdb.table1"}, {table_path =
"testdb.table2", query = "select * id, name from testdb.table2"}]```<br/>,and
supports using regular expressions in `table_path`, for example: ```[{
table_path = "testdb.table\\d+", "use_regex" = "true"}]``` will match all
tables that start with "table" followed by numbers.
|
Review Comment:
ditto
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:
##########
@@ -83,24 +85,41 @@ public static Map<TablePath, JdbcSourceTable> getTables(
Map<String, Map<String, String>> unsupportedTable = new
LinkedHashMap<>();
for (JdbcSourceTableConfig tableConfig : tablesConfig) {
try {
- CatalogTable catalogTable =
- getCatalogTable(tableConfig, jdbcCatalog,
jdbcDialect);
- TablePath tablePath =
catalogTable.getTableId().toTablePath();
- JdbcSourceTable jdbcSourceTable =
- JdbcSourceTable.builder()
- .tablePath(tablePath)
- .query(tableConfig.getQuery())
-
.partitionColumn(tableConfig.getPartitionColumn())
-
.partitionNumber(tableConfig.getPartitionNumber())
-
.partitionStart(tableConfig.getPartitionStart())
-
.partitionEnd(tableConfig.getPartitionEnd())
-
.useSelectCount(tableConfig.getUseSelectCount())
-
.skipAnalyze(tableConfig.getSkipAnalyze())
- .catalogTable(catalogTable)
- .build();
- tables.put(tablePath, jdbcSourceTable);
- if (log.isDebugEnabled()) {
- log.debug("Loaded catalog table : {}, {}",
tablePath, jdbcSourceTable);
+ // Check if table path should be treated as regex
+ boolean isRegexPath =
+ (tableConfig.getUseRegex() != null &&
tableConfig.getUseRegex());
+
Review Comment:
It should never be null. If user not configure it, it will return default
value by readonly config.
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:
##########
@@ -409,4 +428,102 @@ private static ReadonlyConfig
extractCatalogConfig(JdbcConnectionConfig config)
catalogConfig.put(JdbcOptions.HANDLE_BLOB_AS_STRING.key(),
config.isHandleBlobAsString());
return ReadonlyConfig.fromMap(catalogConfig);
}
+
+ /** Process table path with regex pattern and add matched tables to the
result. */
+ private static void processRegexTablePath(
+ AbstractJdbcCatalog jdbcCatalog,
+ JdbcDialect jdbcDialect,
+ JdbcSourceTableConfig tableConfig,
+ Map<TablePath, JdbcSourceTable> result)
+ throws SQLException {
+
+ String tablePath = tableConfig.getTablePath();
+ log.info("Processing table path with regex: {}", tablePath);
+
+ // Parse table path to extract database, schema and table patterns
+ String databasePattern = ".*";
+ String tableNamePattern = ".*"; // This is just the table name part
+
+ // Convert the table path to a format suitable for regex matching
+
+ // Step 1: Replace escaped dots (\.) with a placeholder
+ String processedTablePath = tablePath.replace("\\.", DOT_PLACEHOLDER);
+ log.info("After replacing escaped dots with placeholder: {}",
processedTablePath);
Review Comment:
```suggestion
log.debug("After replacing escaped dots with placeholder: {}",
processedTablePath);
```
##########
seatunnel-api/src/main/java/org/apache/seatunnel/api/table/catalog/Catalog.java:
##########
@@ -164,15 +164,42 @@ default List<CatalogTable> getTables(ReadonlyConfig
config) throws CatalogExcept
Pattern databasePattern =
Pattern.compile(config.get(ConnectorCommonOptions.DATABASE_PATTERN));
Pattern tablePattern =
Pattern.compile(config.get(ConnectorCommonOptions.TABLE_PATTERN));
+
List<String> allDatabase = this.listDatabases();
allDatabase.removeIf(s -> !databasePattern.matcher(s).matches());
List<TablePath> tablePaths = new ArrayList<>();
+
for (String databaseName : allDatabase) {
tableNames = this.listTables(databaseName);
Review Comment:
Let's add new method listSchemaTables return List<TablePath>? So you should
not parse schema for table name.
##########
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:
##########
@@ -63,6 +64,7 @@
@Slf4j
public class JdbcCatalogUtils {
private static final String DEFAULT_CATALOG_NAME = "jdbc_catalog";
+ private static final String DOT_PLACEHOLDER = "$DOT$";
Review Comment:
```suggestion
private static final String DOT_PLACEHOLDER = "__$DOT$__";
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]